Open ZhangBio opened 2 years ago
Like I know there are Sequel II sequencing kit 1.0, Sequel II sequencing kit 2.0. It sounds like they are different, but they could both use SP3-C3?
Hi,
It was never 100% clear to me.
My understanding is that :
Except the difference between Sequel II v1 and Sequel II v2, it seems that for a same sequencer, different sequencing kits can be used interchangeably without having to switch the in-silico control model.
I opened an issue at the PacBio KineticsTools two years about a similar subject; see here
There you can read one PacBio developer say; I quote :
rhallPB commented on 28 Jan 2020
Note also, S2-P2 model has been shown to be effective with Sequel II chemistry version 2.0. We don't have a good model for Sequel II chemistry version 1.0.
Which is consistent with the summary I made above
I never thought anyone would be interested in my software. Please let me know if I can help anyhow in your analysis.
Thank you very much for your reply! The kinetics features in P6-C4 and P5-C3 P4-C2 model provided in SMRT analysis v2.3 seem to be a lot different. It will be much easier if the sequel could share the same model. There are so little information on the internet, it's really precious to have your reply.
Actually, I would expect works like SMALR will have more attention since methylation heterogenenity in prokaryotes could be very important. This software will definely be helpful when people want to conduct analysis using today's sequel data!
Yes, information on the subject is very hard to find indeed
Like you say P6-C4 and P5-C3 P4-C2 are very different but I presumed (maybe I was wrong) that these were just incremental upgrades with P6-C4 just being the "best" (?)
I will do my best to find the right sources in a near future, and compile them in the README
Until then what I can tell with certitude is that the SP2-C2 model worked great on our Sequel I E. coli data.
I never tested on Sequel II data. If you have SMSN data produced with a Sequel II sequencer that can be used for benchmark, I would be glad to help
I'm letting this issue opened for the moment and I'll close it when I'll find more informations on the models versus sequencing kits. Maybe I could even just parse the header to match it automatically; I just did not do it yet because I thought no one else would use it
You can also have a look at this repo where I did some retro-engineering of the in silico control
https://github.com/EMeyerLab/ipdtools
I did this at the time where SP3-C3 was not yet released, and before the model formats changed, but I'm reasonably confident that the repo is still valid as of June 2022
A method is compare the "tMean" and "modelPrediction" with WGA data, if the 2 values are good correlated, it means it's using the correct model, maybe I'll try this later when I have time. But a recent paper shows the correlationship between obeserved IPD and predicted IPD in WGA is not good, I dont know where is wrong. https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-022-08471-2
Hi, sorry for the late answer.
To my experience there is a systematic biais between the observed IPDs and the modelPrediction which, as you mention, kind of prevents this kind checking...
My own verification on my own data was that, using the SP2-C2 model on my E. coli data, almost all the DNA modifications that I can detect were located either in GATC or EcoK sites, which are indeed known for their abundance of 6mA. But I do not have access to more recent SMSN Sequel II data. I would be glad if someone could provide me some
I have a bit of time to take care of that issue at the moment... Are you still interested in using the software ? Do you have any data that may help ?
Guillaume
Perhaps this repo that I have made a few years ago now, could help to test your suggested solution.
Sorry for the late reply. I used some man-made data from previous researches where "positive" are treated by MTase, and corresponding controls are WGA data. But "A" sites in MTase treated group seem not always predicted to be methylated. I'm can't tell whether it's the problem of ipdsummary or the enzyme treatment efficiency. https://www.ncbi.nlm.nih.gov/sra/SRX12017172[accn] https://www.ncbi.nlm.nih.gov/sra/SRX9611878[accn]
HI! Happy to see a sequel version of SMALR. The chemistries in sequel seem to be more described as "sequencing kit v xxx" How should I set "--model“, if I only know the version of sequencing kit, do you know the relationship between the version of sequencing kit and SP2-C2 or SP3-C3?