Using intermediate output for RNA-Bloom2 transcriptome assembly?

Hi,

I am hoping to perform transcriptome assembly using both nanopore long read sequencing data and illumina short read sequencing data. It appears RATTLE only permits the use of long read sequencing data, so I was hoping to use the error-corrected long reads produced by running the first two steps, cluster and correct.

I then wanted to use these corrected reads with RNA-Bloom2, which permits assembly using both long and short reads.

My questions for you are:

Is this general approach sound or is there some oversight I might be making in such an approach mixing tools?
Should I also use the uncorrected.fq in addition to the corrected.fq for downstream results?
Would you recommend changing -r, --min-reads from the default of 5 to something like 2 in order to correct as many reads as possible?

Thanks for your time and any help you can provide. If this approach doesn't seem sound, can you recommend any other method of long read correction for which I do not have an existing genome available for correction?

Thanks, Patrick

Hi Patrick,

It's ok mixing tools. The RATTLE pipeline is modular and flexible precisely to provide the opportunity to mix and match tools and use it the most convenient way.

What you propose could be a good approach.

Yes, you can use the uncorrected and corrected reads together for your next analysis step.

--min-reads is set to 5 because we observed that having at least 5 reads to cluster together and compare with each other was needed to have a reliable correction. If you change to 2, you would correct reads based on the comparison with just two reads. Still possible, but I don't know if that would be reliable enough.

Please let me know how it goes

cheers

Eduardo

On Thu, 8 Aug 2024 at 00:47, patrickaoude @.***> wrote:

Hi,

I am hoping to perform transcriptome assembly using both nanopore long read sequencing data and illumina short read sequencing data. It appears RATTLE only permits the use of long read sequencing data, so I was hoping to use the error-corrected long reads produced by running the first two steps, cluster and correct.

I then wanted to use these corrected reads with RNA-Bloom2 https://github.com/bcgsc/RNA-Bloom, which permits assembly using both long and short reads.

My questions for you are:

Is this general approach sound or is there some oversight I might be making in such an approach mixing tools?

Should I also use the uncorrected.fq in addition to the corrected.fq for downstream results?

Would you recommend changing -r, --min-reads from the default of 5 to something like 2 in order to correct as many reads as possible?

Thanks for your time and any help you can provide. If this approach doesn't seem sound, can you recommend any other method of long read correction for which I do not have an existing genome available for correction?

Thanks, Patrick

— Reply to this email directly, view it on GitHub https://github.com/comprna/RATTLE/issues/54, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB373IXG4RUYLV67DGTZQIXQJAVCNFSM6AAAAABMESRCAKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2TGNRYGA4DKNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

comprna / RATTLE

Using intermediate output for RNA-Bloom2 transcriptome assembly? #54