FelixKrueger / SNPsplit

Allele-specific alignment sorting
http://felixkrueger.github.io/SNPsplit/
GNU General Public License v3.0
52 stars 20 forks source link

All reads are unassignable #12

Closed RoseString closed 7 years ago

RoseString commented 7 years ago

Dear SNPsplit developer,

I am running into an issue where all reads are unassignable, although my SNP file and bam file use the same reference. My aligner is bismark v0.18.1_dev and the version for SNPsplit is 0.3.2_dev.

I therefore made a tiny test SNP file https://github.com/RoseString/Sparrow/blob/master/test.txt and a test sam file https://github.com/RoseString/Sparrow/blob/master/test.sam to see if the assignment works.

In the test dataset, both SN996:348:HLW3VBCXY:1:1106:7653:2122_1:N:0:GCGCTA and SN996:348:HLW3VBCXY:1:1106:3931:2390_1:N:0:GCGCTA should be assigned to G1 because they overlap with the SNPs I provided; however, they are marked UA for some reason.

Allele-tagging report

Processed 6 read alignments in total Reads were unaligned and hence skipped: 0 (0.00%) 6 reads were unassignable (100.00%) 0 reads were specific for genome 1 (0.00%) 0 reads were specific for genome 2 (0.00%) 0 reads did not contain one of the expected bases at known SNP positions (0.00%) 0 contained conflicting allele-specific SNPs (0.00%)

The command I am using is SNPsplit --bisulfite --paired --sam --snp_file test.txt test.sam

I would really appreciate it if you can take some time to look into this. Thanks!

Dan

FelixKrueger commented 7 years ago

Hi Dan,

I am having trouble seeing the files as both links resolve to: "https://github.com/FelixKrueger/SNPsplit/issues/url". Once you have updated them I'm happy to take a look. Oh and a copy of the SNP file would also be needed.

Cheers, Felix

RoseString commented 7 years ago

I figured out how to add links correctly, so here they are! Really appreciate it.

SNP file bismark SAM file

FelixKrueger commented 7 years ago

Ah excellent. I think I have spotted the problem already: your MD:Z: fields in the SAM file do not contain any mismatches to 'N', so SNPsplit cannot assess the SNPs.

The SNPsplit procedure requires you to: 1) create a genome in which the SNPs in question are masked by N. This is automated for the Mouse Genomes Project but you might have to do this yourself for specialised applications 2) As the next step you would run the Bismark genome indexing on the N-masked genome (bismark_genome_preparation) 3) Then you would use Bismark to align your sequences to the N-masked genome prepared in 3) 4) and finally you run SNPsplit on the output BAM file as you already did

I hope this helps, Cheers, Felix

RoseString commented 7 years ago

Thank you so much, Felix! My test data finally works. I didn't realize the importance of MD:Z: field. I'll rerun Bismark against the N-masked genome.