FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
391 stars 102 forks source link

methylation extraction without the XR and XG tag #278

Closed vivekbhr closed 5 years ago

vivekbhr commented 5 years ago

Hi Felix

I have some custom tagged bam files where the first 14 tags are the same as bismark output, but the XR and XG tags are missing. Is it possible to extract the methylation calls using bismark methylation extractor without considering these two tags? I basically want the CpG/CHH.. bedgraphs from these bam files at the end. If something in the code needs to be changed, can you point me to what I should do so I can modify my local copy of bismark?

Thanks, Vivek

FelixKrueger commented 5 years ago

Hi Vivek,

I believe that most downstream applications that use Bismark BAM files (e.g. bismark_methylation_extractor, deduplicate_bismark, SNPsplit, reStrainingingOrder etc.) make use of the XR and XG` tag combination. The only way to make this work would for your 'custom' files would be to reconstitute these read/genome conversion tags I am afraid.

Just out of interest, how did you end up with those files, using a different aligner?

vivekbhr commented 5 years ago

Yes, the data was mapped using BWA and we used a custom script to make the XM tag. Maybe I can add the other two tags too. Are they supposed to contain the bisulfite converted sequence of the reference and the read?

FelixKrueger commented 5 years ago

No, the combination of read conversion (XR, can be CT or GA) and genome conversion (XG, can be CT or GA) indicates which of the four bisulfite strands a read came from (OT, OB, CTOT, CTOB).

It sounds almost like it would be a good idea to use a tool that does this kind of processing natively, wait, I think there is one...! I am sure you have a good reason to start from BWA, and re-implement the things required to proceed with bisulfite-processing?

martinjvickers commented 5 years ago

@vivekbhr If you want to extract methylation from a file without bismark specific tags, you could use MethylDackel

https://github.com/dpryan79/methyldackel

vivekbhr commented 5 years ago

Thanks, Felix and Martin. Yes, we used a modified library prep so we wanted to do some modifications in the methylation tagging. I managed to add the XR and XG tags and bismark now outputs CpG/CHH.. files, but doesn't give the full output (plots etc..) as it apparantly doesn't recognize the extended CIGAR 'S'. I'll check what options I have.

FelixKrueger commented 5 years ago

As of version 0.22.0, Bismark supports local mode (option --local) for both Bowtie 2 and HISAT2, and thus also the CIGAR operation -S. See here: https://github.com/FelixKrueger/Bismark/releases.

vivekbhr commented 5 years ago

Interesting.. It seems that the default bioconda installation doesn't install the latest release. I got bismark 0.20.0 instead of 0.22.1. I re-installed now. thanks

FelixKrueger commented 5 years ago

Ah, it seems we might have to change the conda recipe then... Let me know if you have any questions, but I don't think there is a reason why you would ever re-purpose a non-bisulfite aligner to do what you need doing...

YettaWang commented 4 months ago

Thanks, Felix and Martin. Yes, we used a modified library prep so we wanted to do some modifications in the methylation tagging. I managed to add the XR and XG tags and bismark now outputs CpG/CHH.. files, but doesn't give the full output (plots etc..) as it apparantly doesn't recognize the extended CIGAR 'S'. I'll check what options I have.

i also have some bam file form bwa, Can you tell me how you add the XR and XG tags?

FelixKrueger commented 4 months ago

The XR and XG tags indicate which the conversion state for the reads and genome. respectively. Not sure if is something that even applies to your experiment?

YettaWang commented 4 months ago

The XR and XG tags indicate which the conversion state for the reads and genome. respectively. Not sure if is something that even applies to your experiment?

In fact, I want to generate XM tags for my BAM file for downstream analysis.

YettaWang commented 4 months ago

My existing BAM file was generated by BWA-Meth. Due to the large file size, I don't have time to re-align it right now, and I urgently need to use an analysis tool that requires the XM tag. This situation is causing me a lot of stress.For this reason, I found another tool that can generate XM tags, but it seems to generate them based on XR and XG tags

FelixKrueger commented 4 months ago

Bismark generates the XM tag based on the actually observed sequence, the equivalent extracted genomic sequence (which needs to have handled indels and softclipping appropriately at this point already), and the read conversion state:

https://github.com/FelixKrueger/Bismark/blob/37e2cad18621c2619a9e02d1a69fdfec1819ed23/bismark#L4772

I haven't got a clue whether this is available in bwa-meth output or not I am afraid.

YettaWang commented 4 months ago

Bismark generates the XM tag based on the actually observed sequence, the equivalent extracted genomic sequence (which needs to have handled indels and softclipping appropriately at this point already), and the read conversion state:

https://github.com/FelixKrueger/Bismark/blob/37e2cad18621c2619a9e02d1a69fdfec1819ed23/bismark#L4772

I haven't got a clue whether this is available in bwa-meth output or not I am afraid.

thank you very much, I will study your code carefully