Closed abayega closed 6 years ago
Hi @abayega , The two assumptions that Cogent makes is that (a) the input is full-length transcripts (as opposed to short fragments of transcripts) and (b) the input is high-quality, that is mostly >=99% accuracy.
Even for PacBio Iso-Seq data, I don't recommend ppl use raw data or CCS sequences that are not of high quality. Since Cogent does not do error correction, the data would have to already be good quality. The family finding partition might still work (since Mash works based on k-mers) but at the reconstruction step you may get some strange results.
I would recommend doing some sort of error correction with the nanopore data first. Or at least filter for higher quality.
--Liz
Got it, thank you
Great. Thank youOn Mar 11, 2018 01:25, Magdoll notifications@github.com wrote:Hi @abayega , The two assumptions that Cogent makes is that (a) the input is full-length transcripts (as opposed to short fragments of transcripts) and (b) the input is high-quality, that is mostly >=99% accuracy. Even for PacBio Iso-Seq data, I don't recommend ppl use raw data or CCS sequences that are not of high quality. Since Cogent does not do error correction, the data would have to already be good quality. The family finding partition might still work (since Mash works based on k-mers) but at the reconstruction step you may get some strange results. I would recommend doing some sort of error correction with the nanopore data first. Or at least filter for higher quality. --Liz
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.
Great, thank youOn Mar 11, 2018 01:25, Magdoll notifications@github.com wrote:Hi @abayega , The two assumptions that Cogent makes is that (a) the input is full-length transcripts (as opposed to short fragments of transcripts) and (b) the input is high-quality, that is mostly >=99% accuracy. Even for PacBio Iso-Seq data, I don't recommend ppl use raw data or CCS sequences that are not of high quality. Since Cogent does not do error correction, the data would have to already be good quality. The family finding partition might still work (since Mash works based on k-mers) but at the reconstruction step you may get some strange results. I would recommend doing some sort of error correction with the nanopore data first. Or at least filter for higher quality. --Liz
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.
Hi,
have you used Cogent on Nanopore RNA-Seq data where the raw-read errors are ~15%? If yes, could comment on the performance of Cogent?
Thanks