fanglab / nanodisco

nanodisco: a toolbox for discovering and exploiting multiple types of DNA methylation from individual bacteria and microbiomes using nanopore sequencing.
Other
66 stars 7 forks source link

WGA #39

Closed linuxxue closed 1 year ago

linuxxue commented 2 years ago

Hi,I have a WGA data of EScherichia coli. Can this data be used together with NAT data of multiple escherichia coli for reference?Whether each E. coli strain needs its own WGA data? Thanks a lot.

touala commented 2 years ago

Hi @linuxxue,

This is not ideal as the comparison of signal between native and unmatched WGA will produce noise at genomic locations where SNV are found. This negative effect will become larger as genome identity decrease between the WGA and the native strains. This also bring up the question on how to generate the reference genomes to use in the analysis. Do you have other types of data for those samples, like Illumina for assembly polishing?

Depending on your experiment design, you can consider generating all native but only one WGA. Run the analysis with this WGA as signal reference and see how the motif detection procedure behave. If the motif results and signatures are not convincing, then generate the remaining WGA for problematic samples.

To summarize, I generally can't recommend doing it. For example, this will likely work for a set of genetically engineered E. coli strains with a couple of hundreds mutations or some genes KO but I can't generalized.

Please let me know if you have other questions,

Alan