Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
334 stars 80 forks source link

Using preprocessed (clusteredaligned) PB Iso-Seq data with protein data? #830

Open carla-hazelf opened 1 month ago

carla-hazelf commented 1 month ago

Hi, Firstly, thank you for producing this tool. I think it looks straightforward to use, but I'm new to this method, so just have a question (sorry if it's already been asked/answered);

I have some PacBio Iso-Seq data. I have been processing it using IsoSeq (https://github.com/PacificBiosciences/IsoSeq), including clustering and aligning to the genome, and then sorting the resulting .bam file. I will be using this data alongside proteins from a OrthoDB. So using Iso-Seq and the protein database-- I was wondering what is the correct method for this please? I'm asking because I read through this https://github.com/Gaius-Augustus/BRAKER/blob/master/docs/long_reads/long_read_protocol.md), which seems to be for long-read and short reads integreation, but does the processing using different tools Sorry if this has already been explained, I'm new to this work and trying

At the moment I have this script;

singularity exec -B brakerdirectory braker3.sif braker.pl \
    --genome=softmasked_genome.fasta \
    --bam=isoseq_all_tissues_clustered_hq_aligned_to_genome.sorted.bam \
    --prot_seq=braker_db/Vertebrata.fa \
    --workingdir=bakerdirectory \
    --threads=xxx \
    --gff3 \
    --species=species_name \
    --softmasking \
    --UTR=on \
    --nocleanup \
    --addUTR=on \

Is this OK to use, or should I be pre-processing the Iso-Seq reads according to https://github.com/Gaius-Augustus/BRAKER/blob/master/docs/long_reads/long_read_protocol.md ?