FelixKrueger / SNPsplit

Allele-specific alignment sorting
http://felixkrueger.github.io/SNPsplit/
GNU General Public License v3.0
51 stars 19 forks source link

Why I can't split bisulfite data ? #64

Closed chaoyang666 closed 1 year ago

chaoyang666 commented 2 years ago

Hi:

I have a problem. Could you please help me with this? I can't split my bam file with SNPsplit and can't find the reasons.

My run code is as below: SNPsplit --bisulfite --snp_file B73_V4.snp.SNP my.bam

input file1 B73_V4.snp.SNP 1 Chr10 92742 1 G/C 2 Chr10 92778 1 A/C 3 Chr10 92928 1 T/A 4 Chr10 92939 1 A/T 5 Chr10 92942 1 A/G 6 Chr10 92975 1 T/G 7 Chr10 92986 1 T/A 8 Chr10 93130 1 A/T 9 Chr10 93213 1 T/C 10 Chr10 93224 1 C/T input file2 my.bam image

££££££££££££££££

OUTPUT: Allele-tagging report

Processed 158277819 read alignments in total Reads were unaligned and hence skipped: 0 (0.00%) 158277819 reads were unassignable (100.00%) 0 reads were specific for genome 1 (0.00%) 0 reads were specific for genome 2 (0.00%) 0 reads that were unassignable contained C>T SNPs preventing the assignment 1528 reads did not contain one of the expected bases at known SNP positions (0.00%) 0 contained conflicting allele-specific SNPs (0.00%)

Many thanks, Chao 13/07/2022

FelixKrueger commented 2 years ago

The most important question for this is: did you construct, index and align an N-masked genome (e.g. with SNPsplit_genome_preparation) before running the bisulfite alignments. From looking at the MD:Z: string I can't see any mismatches to Ns, that's why I'm asking.

A bit further up in the SNPsplit report you should also see the stat about how many reads contained mismatches to Ns. If that is 0 it would also tell you whether that was the reason...

chaoyang666 commented 2 years ago

Dear Felix, Thank you very much. Could you please help me; Why I get the error information like as:

Reading/filtering VCF file:             No (skipped by user)
Reference genome:                       01-maize/split/
N-masking:                              Yes
Full SNP genome:                        Yes
SNP strain:                             maize
Using the following chromosomes (HARCODED IN!!!):
1       2       3       4       5       6       7       8       9       10      11      12      13      14      15      16      17    18       19      20      21      22      X       Y       MT

Skipped reading the VCF file and filtering SNPs again (specified by user)

Now reading in and storing sequence information of the genome specified in: 01-tools/03-genome/01-maize/split/

chr Chr10 (150982314 bp)
chr Chr1 (307041717 bp)
chr Chr2 (244442276 bp)
chr Chr3 (235667834 bp)
...............
...............
[FATAL ERROR] Please ensure that the same version of the genome is used for both VCF annotations and reference genome (FastA files). Exiting...

I run like this: SNPsplit_genome_preparation --skip_filtering --ref 01-maize/split --strain maize --full_sequence

I have the following files in the folder: SNPs_maize chr10.txt chr1.txt chr2.txt chr3.txt chr4.txt chr5.txt chr6.txt chr7.txt chr8.txt chr9.txt

chr1.txt >Chr1 796078 Chr1 26291 1 T/C 796079 Chr1 26361 1 C/A 796080 Chr1 26421 1 C/T 796081 Chr1 26442 1 C/T 796082 Chr1 26605 1 C/T 796083 Chr1 26650 1 A/G

my reference genome is: >Chr10 CCTAAACCCTAAACCCTAAAC

Thank you very much! Chao

chaoyang666 commented 2 years ago

Dear Felix,

So far, I have solved this problem by modifying the code of SNPsplit-0.3.4/SNPsplit_genome_preparation `

@chroms = (1..22,'X','Y','MT'); # this is currently using chromosomes for the human genome

@chroms = ('Chr1','Chr2','Chr3','Chr4','Chr5','Chr6','Chr7','Chr8','Chr9','Chr10');

` Thanks a lot, Chao