Trinotate / Trinotate.github.io

web documentation for Trinotate
47 stars 17 forks source link

Question regarding removing contaminations #10

Closed ghost closed 5 years ago

ghost commented 5 years ago

Two questions regarding contamination in RNA-seq data and its handling in a Trinity workflow:

1) Do you, ideally, (1) filter contamination on the level of raw reads or do you filter (2) after Trinity assembly? We are talking about viruses/prokaryotes as the source of contamination. Not eukaryotes. So an assembly should probably not mix contamination with RNA from my species of interest. For this reason, I would intuitively go for filtering after assembly.

2) Which tools did you use / do you recommend? On the raw read level Kraken2 is a good candidate I guess. What would you recommend when filtering on assembly level?

Thx!

brianjohnhaas commented 5 years ago

Hi,

responses below

On Sat, Oct 20, 2018 at 10:53 AM MichaelsGITIGIT notifications@github.com wrote:

Two questions regarding contamination in RNA-seq data and its handling in a Trinity workflow:

1.

Do you, ideally, (1) filter contamination on the level of raw reads or do you filter (2) after Trinity assembly? We are talking about viruses/prokaryotes as the source of contamination. Not eukaryotes. So an assembly should probably not mix contamination with RNA from my species of interest. For this reason, I would intuitively go for filtering after assembly.

Some use sortmerna to remove rRNA before doing assembly. As far as other filtering, I'd do it after assembly, but I haven't done any benchmarking or anything to determine what way is best. Others might comment here on their experiences.

1.

Which tools did you use / do you recommend? On the raw read level Kraken2 is a good candidate I guess. What would you recommend when filtering on assembly level?

Kraken or Centrifuge are my go-to tools. I tend to go w/ Centrifuge but generate Kraken-style reports using their included utilities.

1.

Thx!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Trinotate/Trinotate.github.io/issues/10, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMVX5ywW4yPpAp336sqr4-Oms0LicWDks5umzjigaJpZM4Xx9xK .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

ghost commented 5 years ago

Many thanks for the reply!

You wrote:

Some use sortmerna to remove rRNA before doing assembly

Would you recommend to remove rRNA? Why / Why not?

Thanks!

brianjohnhaas commented 5 years ago

Sometimes folks want to remove rRNA before assembly because the reads can contribute a significant number of reads and they're mostly after coding transcripts (polyA-selected). Any rRNA reads would be those that slip through the experimental effort to deplete them.

On Mon, Oct 22, 2018 at 11:02 AM MichaelsGITIGIT notifications@github.com wrote:

Many thanks for the reply!

You wrote:

Some use sortmerna to remove rRNA before doing assembly

Would you recommend to remove rRNA? Why / Why not?

Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Trinotate/Trinotate.github.io/issues/10#issuecomment-431861402, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMVX6lJ_visipDVhP7Y4OK9_XPvPWvrks5und4FgaJpZM4Xx9xK .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

ghost commented 5 years ago

Two quick follow-up questions regarding rRNA remainders in Illumina rawdata: 1) If rRNA ends up in the Trinity assembly, did you ever see problems with this? Particularly if one is interested in mRNA (expression/GO-enrichement) only. 2) Instead of using sortmerna on the raw Illumina data, why not just use a tool like rnammer on the assembly and kick out the hits? Or is there a problem with that?

Thanks!

brianjohnhaas commented 5 years ago

Responses below

On Thu, Nov 1, 2018 at 11:39 AM MichaelsGITIGIT notifications@github.com wrote:

Two quick follow-up questions regarding rRNA remainders in Illumina rawdata:

  1. If rRNA ends up in the Trinity assembly, did you ever see problems with this? Particularly if one is interested in mRNA (expression/GO-enrichement) only.

I personally don't worry about it. If it turns up as a DE feature, I'd just ignore it.

  1. Instead of using sortmerna on the raw Illumina data, why not just use a tool like rnammer on the assembly and kick out the hits? Or is there a problem with that?

That should be fine too. It's a personal preference as to which strategy to take. All part of the rna-seq analysis journey.

best,

~b

Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Trinotate/Trinotate.github.io/issues/10#issuecomment-435076135, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMVXxZ8da5RRzBT9yfD6GilTnLuPQmfks5uqxWvgaJpZM4Xx9xK .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

ghost commented 5 years ago

Thx much Brian!