Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
350 stars 79 forks source link

Long-read protocol with Nanopore RNA-seq data #781

Open tinameiring opened 6 months ago

tinameiring commented 6 months ago

Hi there,

I want to find out what protocol I should follow when annotating a genome with short and long read RNA-Seq (nanopore) and protein data.

In this issue https://github.com/Gaius-Augustus/BRAKER/issues/672 you mention that the long-read protocol is outdated. I'm running braker3 on the short-read RNA data and the protein data.

For the long-read data, how should go about the GeneMarkS-T protocol? Should I collapse redundant isoforms with cupcake or Isoseq Collapse? Do you have any suggestions?

KatharinaHoff commented 6 months ago

I have not seen the latest Nanopore data. It is said to be more accurate than before. We have seen some serious problems with early PacBio HiFi isoseq data, that had too little depth and still too many errors. With massive amounts of state of the art HiFi data, we obtained reasonable results.

Have a look at the poster https://github.com/Gaius-Augustus/BRAKER/blob/master/docs/posters/poster_PAG2024.pdf . I show precisely what I modified in GeneMark-ETP in order to run a stringtie assembly for isoseq reads from bam input to BRAKER. You may want to look into how you need to modify the same command for Nanopore (maybe it's identical, maybe it is another option). Then you can run BRAKER3 with the long read data (only), provided as bam file, and with the short read data only, and then merge the two gene sets with TSEBRA.

BRAKER3 internally basically executes a GeneMarkS-T protocol including denoising.

tinameiring commented 6 months ago

Thank you for the help, I appreciate it.

On Thu, Mar 14, 2024 at 2:14 PM Katharina Hoff @.***> wrote:

I have not seen the latest Nanopore data. It is said to be more accurate than before. We have seen some serious problems with early PacBio HiFi isoseq data, that had too little depth and still too many errors. With massive amounts of state of the art HiFi data, we obtained reasonable results.

Have a look at the poster https://github.com/Gaius-Augustus/BRAKER/blob/master/docs/posters/poster_PAG2024.pdf . I show precisely what I modified in GeneMark-ETP in order to run a stringtie assembly for isoseq reads from bam input to BRAKER. You may want to look into how you need to modify the same command for Nanopore (maybe it's identical, maybe it is another option). Then you can run BRAKER3 with the long read data (only), provided as bam file, and with the short read data only, and then merge the two gene sets with TSEBRA.

BRAKER3 internally basically executes a GeneMarkS-T protocol including denoising.

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/781#issuecomment-1997306868, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIWXKUYYVOATW2PHFMU2S6TYYGIA5AVCNFSM6AAAAABEV6N7G6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGMYDMOBWHA . You are receiving this because you authored the thread.Message ID: @.***>