IsoSeq trascriptome data

There are two options:

https://github.com/Gaius-Augustus/BRAKER/blob/master/docs/long_reads/long_read_protocol.md This protocol is a bit older but it allows you to easily use both short and long reads. I recommend looking at it and understanding it even if you don't apply it, but choose to do the next option
Run BRAKER with short reads, and run BRAKER with long reads, separately, merge the resulting gene sets with TSEBRA (see 1.)

In theory, it is rather simple to apply BRAKER3 to long reads in combination with an OrthoDB partition. In practice, we have neither cleanly evaluated this, nor implemented it.

I have a development docker container that currently allows you to input a bam file with splice aligned long reads - only long reads! - instead of a bam file with splice aligned short reads. (Do not input a fastq file, do not input SRA IDs, really only bam input.) It also needs the OrthoDB partition fasta file as input.

singularity build braker_lr.sif docker://katharinahoff/playground:devel

singularity exec braker_lr.sif braker.pl --genome=genome.fa --prot_seq=orthodb_partition.fa --bam=longreads.bam

As I said: this is a development/playground, not a readily developed BRAKER version. It works, two people tested it, independently. What we can carefully say already is that you need a lot of very high quality PacBio isoseq reads to get good accuracy by running BRAKER this way. If you have low coverage libraries, or older data with a higher error rate, I advise against using the long read data in this way, at all.

[In addition run the standard BRAKER3 with short reads + OrthoDB paritition -> merge the two BRAKER gene sets with TSEBRA.]

Gaius-Augustus / BRAKER

IsoSeq trascriptome data #722