Magdoll / Cogent

Coding Genome Reconstruction using Iso-Seq data
BSD 3-Clause Clear License
60 stars 17 forks source link

Nanopore data #23

Closed abayega closed 6 years ago

abayega commented 6 years ago

Hi,

have you used Cogent on Nanopore RNA-Seq data where the raw-read errors are ~15%? If yes, could comment on the performance of Cogent?

Thanks

Magdoll commented 6 years ago

Hi @abayega , The two assumptions that Cogent makes is that (a) the input is full-length transcripts (as opposed to short fragments of transcripts) and (b) the input is high-quality, that is mostly >=99% accuracy.

Even for PacBio Iso-Seq data, I don't recommend ppl use raw data or CCS sequences that are not of high quality. Since Cogent does not do error correction, the data would have to already be good quality. The family finding partition might still work (since Mash works based on k-mers) but at the reconstruction step you may get some strange results.

I would recommend doing some sort of error correction with the nanopore data first. Or at least filter for higher quality.

--Liz

abayega commented 6 years ago

Got it, thank you

abayega commented 5 years ago

Great. Thank youOn Mar 11, 2018 01:25, Magdoll notifications@github.com wrote:Hi @abayega , The two assumptions that Cogent makes is that (a) the input is full-length transcripts (as opposed to short fragments of transcripts) and (b) the input is high-quality, that is mostly >=99% accuracy. Even for PacBio Iso-Seq data, I don't recommend ppl use raw data or CCS sequences that are not of high quality. Since Cogent does not do error correction, the data would have to already be good quality. The family finding partition might still work (since Mash works based on k-mers) but at the reconstruction step you may get some strange results. I would recommend doing some sort of error correction with the nanopore data first. Or at least filter for higher quality. --Liz

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

abayega commented 5 years ago

Great, thank youOn Mar 11, 2018 01:25, Magdoll notifications@github.com wrote:Hi @abayega , The two assumptions that Cogent makes is that (a) the input is full-length transcripts (as opposed to short fragments of transcripts) and (b) the input is high-quality, that is mostly >=99% accuracy. Even for PacBio Iso-Seq data, I don't recommend ppl use raw data or CCS sequences that are not of high quality. Since Cogent does not do error correction, the data would have to already be good quality. The family finding partition might still work (since Mash works based on k-mers) but at the reconstruction step you may get some strange results. I would recommend doing some sort of error correction with the nanopore data first. Or at least filter for higher quality. --Liz

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.