Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
350 stars 79 forks source link

Different outputs from same input data #776

Closed danielwood1992 closed 6 months ago

danielwood1992 commented 7 months ago

Hello,

I am finding slightly different sets of genes from different runs of BRAKER3 on the same reference fasta with the same input RNA-seq bams and protein training sets (I have tried this with long-read only, short-read only and long and short read datasets) - it's about 500 genes out of 35,000 (more like 200 with only the short read data). Is this behaviour expected? If the variable genes are more low-confidence, do you have any recommendations of how I might be able to filter these out? (I'm trying to look at presence/absence variants between assemblies so am trying to reduce the number of potential false positives).

Thanks a lot and best wishes,

Daniel Wood

KatharinaHoff commented 6 months ago

BRAKER performs random split of genes into training, test and validation sets. This can differ in different runs. We expect some fluctuation, yes. It is not related to how reliable these genes are.

danielwood1992 commented 6 months ago

Ok that's good to know, thanks!