brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
144 stars 54 forks source link

Methylation data #51

Closed nchernia closed 6 years ago

nchernia commented 6 years ago

Thanks for the tool. One question - is there an easy way to detect from the read itself whether or not it is methylated? MethylDackel calculates on a per-cytosine basis but I'm also looking for essentially the inverse. Bismark does this with the XM tag.

bwlang commented 6 years ago

I don't think so, but I think the XM tag could be added. It would be nice not to have to hit the reference index for methylation extraction...

bwlang commented 6 years ago

@brentp
I'm working on a tool that needs to identity methylation on a per-read basis. Performance is poor because I have to to a faidx lookup for every read to determine methylation state. What do you think about adding the XM tag like (bismark)? I could take a pass at if you think it would be a useful addition.

brentp commented 6 years ago

when I have to do something like this. I make sure the reads are sorted by chromosome, then, the first time a new chrom is seen, I read that fasta for that chrom into memory. so that's e.g. 250 MB for human chr1. then use it as a string for lookup. you can also use pyfaidx with a large lookahead value.

I'd prefer to keep this out of bwa-meth. I think Methyldackel can do this anyway, no?

nchernia commented 6 years ago

MethylDackel just added this feature. It’s a function called “perRead”

brentp commented 6 years ago

sweet! let's close this then. bwa-meth does a good job for what it does but I want to keep that fairly atomic and let tools like methyl-dackel do the downstream stuff.

bwlang commented 6 years ago

@nchernia : Nice! looks like perRead does what I want. Never would have thought to look at that branch without your prompting - Thanks!