PacificBiosciences / kineticsTools

Tools for detecting DNA modifications from single molecule, real-time sequencing data
19 stars 21 forks source link

Detection of 5mC without TET #26

Open eltonjrv opened 8 years ago

eltonjrv commented 8 years ago

Hello, I am currently with 92x genome coverage (with pacbio reads) and successfully predicted the 6mA and 4mC positions on the genome (running ipdSummary with the "--identify m6A,m4C" flag). I wonder whether I could run ipdSummary in order to detect 5mC modification without using TET. I previously heard that possessing a high genome coverage it would possible. If so, 1) 92x coverage would be enough for that attempt (identifying 5mC)?: 2) Should I keep the "m5C_TET" on the --identify option (as it is said on ipdSummary help page)?

Thanks in advance for your help. Cheers, Elton

ebioman commented 8 years ago

Hi I am wondering the same, 5mC is advertised to be possible but I cant get it running without TET. Any documentation or did the above get solved somehow?

JohnUrban commented 7 years ago

Very naive question, but it sounds like you guys know the answers to these questions. What is 5mC_TET identifying? What is the TET part and what do you mean by "without using TET"? ((Does TET have to do with the TET proteins involved in the conversion of 5-methylcytosines to other methylated forms?))

I just started toying around with these tools so that clarification would be great.

tywedge commented 7 years ago

Tet refers to Tet-conversion of 5mC to 5caC using the Tet enzyme. There is a kit available from WiseGene (http://www.wisegeneusa.com/k004). Conversion of 5mC to 5caC increases the kinetic signature making it easier to detect 5mC using SMRT Sequencing.

“Without using Tet” means detection of 5mC without the conversion.

Tyson Clark

From: John Urban [mailto:notifications@github.com] Sent: Thursday, November 17, 2016 6:34 PM To: PacificBiosciences/kineticsTools kineticsTools@noreply.github.com Subject: Re: [PacificBiosciences/kineticsTools] Detection of 5mC without TET (#26)

Very naive question, but it sounds like you guys know the answers to these questions. What is 5mC_TET identifying? What is the TET part and what do you mean by "without using TET"? ((Does TET have to do with the TET proteins involved in the conversion of 5-methylcytosines to other methylated forms?))

I just started toying around with these tools so that clarification would be great.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/PacificBiosciences/kineticsTools/issues/26#issuecomment-261433051, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABnW3YwJE4YXC-Bk1FW7tCZA15PSR9JUks5q_Q6rgaJpZM4I9KYd.

JohnUrban commented 7 years ago

That is interesting. I specified --identify m6A,m4C,m5C_TET for my insect genome and m5C seems to be the most common mark identified so far. It has only gone part way through the genome, but as it is now I see the following number of modifications found:

11814 m4C 26164 m5C 6334 m6A

What do you make of that? Is it totally unexpected to see m5C without the conversion to 5caC ?

martinjvickers commented 4 years ago

This is still an outstanding question #44. Could a developer give an example of how to use ipdSummary to detect non-TET m5C that has sufficient per strand coverage as advertised.

rhallPB commented 4 years ago

We have not done sufficient validations with recent chemistries to have a perfect answer. First it is important to distinguish detection of a modification and identification of a modification. m5C, just as any modification to the DNA that causes a pause in the polymerase, can be detected given sufficient coverage. This is the first step of the pipeline (i.e. running the pipeline without the --identify parameter), this will detect a 'modified_base' within the region of the m5C (possibly +/- 1-2bp the pause isn't always on the modified base). The identification process takes all the modified bases over a particular confidence (Qmod) and tries to identify by comparison with a model, the model for TET-m5C has not been updated in a number of years, so it's unclear if it would even function for TET-m5C given current chemistries, it's unlikely it will give accurate identification of m5C. The m6A and m4C models are updated for current chemistries. My recommendation for m5C would be to run a control, use the pipeline (without identification) and set the Qmod cut and offset of your calls based on the control / known m5C modification. It is also worth pointing out that this detection is in reference space and only works if the position in the reference is heavily modified. If only a small percentage of the reads are modified then it isn't going to be detected. Also due to the model using sequence context in the calculation of deviation from the expected kinetics, it is important that the data does not have any variants w.r.t to the reference.

mdrishti commented 2 years ago

@rhallPB , thank you for this explanation (even though I am reading this in 2022). You mentioned "it is important that the data does not have any variants w.r.t to the reference." I am guessing that this statement is true for detecting any kind of modification. If that is the case, do you have any suggestion to compare the methylome of two genomes from two isolates of the same strain (one reference and one evolved)? One way could be to just evaluate the raw methylome of both genomes, but even for small-scale data (say-45 isolates against a reference), this will not be a sophisticated approach. Your insights regarding a probable solution to this problem would be very useful for me.