FelixKrueger / SNPsplit

Allele-specific alignment sorting
http://felixkrueger.github.io/SNPsplit/
GNU General Public License v3.0
52 stars 20 forks source link

BS-seq about SNP_split #43

Closed zhangaicen closed 3 years ago

zhangaicen commented 3 years ago

Dear Felix, recently I tried to use the SNP_split to split BS-seq reads from a rice hybrid, in the user guide I see that for SNP_split it adatped the bam from bismark, and before using SNP_split, the SNP position should be masked with "N", but when using the bismark, it will firstly convert "C" to "T", "G" to "A" in the BS-seq genome preparation and mapping step, so is the SNP positions masked by "N" are excluded for methylation level caculation because of no conversion?

Waiting for your reply, thanks.

FelixKrueger commented 3 years ago

Hi @zhangaicen

Yes, the workflow for bisulfite sequencing would be:

  1. Create an N-masked genome reference (for the mouse genome this would work out-of-the-box with SNPsplit_genome_preparation. but for a (phased?) rice hybrid genome you would have to come up with a solution for this yourself).
  2. Index the N-masked genome
  3. Align with Bismark to the N-masked genome
  4. Deduplicate
  5. Spit allele-specifically using SNPsplit#

Indeed, positions aligning to Ns in the genome will not receive a methylation call (only positions with a C in the genome are called). The behaviour in SNPsplit is a little different though: if the read aligns to the strand containing an N-masked C, and the SNP was a C to T transition, the position is not used for allele-assignment (as the information could either be a SNP or a methylation state). If the read aligns to the strand opposite the C, the position may be used for allele-assignment.

Does this help?

zhangaicen commented 3 years ago

Yes,I‘ve finished the step1-step4, so considering the whole genome, the methytion level of SNP site may not have to much impact, so I think this can be ignored, all right?

FelixKrueger commented 3 years ago

I wouldn't think that the SNPs matter very much for the global picture, and at least half of reads over C to T SNPs can still be used for allele-assignments. Good luck!