Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
334 stars 80 forks source link

braker3_lr threads #820

Open danielwood1992 opened 1 month ago

danielwood1992 commented 1 month ago

Hello,

I am using the braker3_lr.sif image - when running braker.pl, the diamond step ends up using all the threads on the node rather than the number specified by --threads: do you know how I could limit the number of threads it uses?

Thanks a lot,

Daniel Wood

KatharinaHoff commented 1 month ago

@tomasbruna would you know where in ETP I need to fix that? For the long reads container, I am pulling from here: https://github.com/KatharinaHoff/GeneMark-ETP/tree/longread_experimental_dev

tomasbruna commented 1 month ago

I'll look into it

augeas commented 1 month ago

Hello, (RSE @ QMUL here...) One of our research sys-admins has observed that it's diamond doing the over-threading.

I notice that runDiamond in prothint.py seems to honour the --threads argument, but in filter.py it does not.

tomasbruna commented 1 month ago

@KatharinaHoff, I pushed a fix to a new ETP branch at https://github.com/gatech-genemark/GeneMark-ETP/compare/main...thread-fix

This should do the trick, but I haven't had a chance to test it fully. Let me know if it works for you, I'll merge the fix to the main ETP branch if it does.

augeas commented 1 month ago

@tomasbruna I've tried applying your patch to @KatharinaHoff's fork in the Dockerfile:

# include ETP
RUN cd /opt && \
    git clone https://github.com/KatharinaHoff/GeneMark-ETP.git && \
    cd GeneMark-ETP && \
    wget -O thread-fix.patch https://github.com/gatech-genemark/GeneMark-ETP/compare/main...thread-fix.patch && \
    git apply thread-fix.patch && \
    cd .. && \
#    cd GeneMark-ETP && \ # these lines are for the isoseq container
#    git checkout longread_experimental_dev && \
#    cd .. && \
    mv GeneMark-ETP ETP && \
    chmod a+x /opt/ETP/bin/*py /opt/ETP/bin/*pl /opt/ETP/tools/*

We'd observed that diamond was only over-threading some of the time, which makes sense as the --threads option was only missing in filter.py. @danielwood1992 should be able to try our apptainer build on the cluster shortly. We shall See How It Goes.

danielwood1992 commented 1 month ago

Great, thanks very much all! I am running this now so hopefully we should know soon if this solves the problem.

KatharinaHoff commented 1 month ago

Thank you very much, @tomasbruna !

The branch that @tomasbruna created was a branch of the main GeneMark-ETP repository. It does not contain a bugfix that @rchikhi made, and it also does not contain the stringie isoseq assembly command that I believe you need. That's a bit of mess that I had intended to resolve with Alex during PAG in January, but unfortunately, we did not have a chance to discuss it. I pulled the changes by Tomas into my fork (both into main and longreads), and I pushed an updated isoseq container to dockerhub. It is currently tagged as devel, it does include the stringtie long reads assembly command. If your tests go ok, I will retag it to isoseq, and I will then also pull the changes into the main container for short reads. This is how you can build the current testing container (temporary, devel will be replaced as soon as I plan a new release):

singularity build braker3_lr.sif docker://teambraker/braker3:devel
KatharinaHoff commented 1 month ago

@tomasbruna I've tried applying your patch to @KatharinaHoff's fork in the Dockerfile:

Yes, there was sth important missing in your approach: git checkout longread_experimental_dev , this is essential because that contains the stringtie command line flag for long reads.

augeas commented 1 month ago

@KatharinaHoff, @tomasbruna, we've informed @danielwood1992 that the devel Docker image should now be used until the changes are merged. However, my previous bodged build (the patch to GeneMark-ETP applied in the main braker3 Dockerfile) has at least run successfully without over-threading and causing HPC node-alarms. (Not that there was much doubt that sending the correct number of threads to diamond in both cases would fix the problem.)

Cheers.

KatharinaHoff commented 1 month ago

Thanks for feedback, I will push and tag both containers (short and long reads), soon. It will probably take til tomorrow or Wednesday because my calendar is very full. But you have the solution that you currently need, so that's probably fine.

danielwood1992 commented 1 month ago

Great, thanks again!

tomasbruna commented 1 month ago

Thanks for the test, I will merge this into the main etp branch.

danielwood1992 commented 1 month ago

Hello,

Trying to run braker3_lr.siff from docker://teambraker/braker3:devel with data that seemed to work ok in previous versions now gives an error. The stderr is below. Do you know why this might be?

ERROR in file /opt/BRAKER/scripts/braker.pl at line 6827 Failed to execute: /opt/Augustus/bin//augustus --species=20240518005753.S13_PG119_fastq --AUGUSTUS_CONFIG_PATH=/data/home/mpx545/.augustus /data/SBCS-BuggsLab-Ash/DanielWood/PG2_PanGenome/PG2_20_pangenome_annotation/PG2_4_4_2/S13_PG119_fastq.PG2_4_4_6.lr/train.gb.test 1>/data/SBCS-BuggsLab-Ash/DanielWood/PG2_PanGenome/PG2_20_pangenome_annotation/PG2_4_4_2/S13_PG119_fastq.PG2_4_4_6.lr/firsttest.stdout 2>/data/SBCS-BuggsLab-Ash/DanielWood/PG2_PanGenome/PG2_20_pangenome_annotation/PG2_4_4_2/S13_PG119_fastq.PG2_4_4_6.lr/errors/firsttest.stderr!

firsttest.stderr gives:

Properties::getProperty(): no such key "/Constant/trans_init_window".Properties::getProperty(): no such key "/Constant/ass_upwindow_size".Properties::getProperty(): no such key "/Constant/ass_start".Properties::getProperty(): no such key "/Constant/ass_end".Properties::getProperty(): no such key "/Constant/dss_start".Properties::getProperty(): no such key "/Constant/dss_end".Properties::getProperty(): no such key "/Constant/init_coding_len".Properties::getProperty(): no such key "/Constant/intterm_coding_len".Properties::getProperty(): no such key "/Constant/decomp_num_at".Properties::getProperty(): no such key "/Constant/decomp_num_gc".Properties::getProperty(): no such key "/Constant/decomp_num_steps".Properties::getProperty(): no such key "/Constant/min_coding_len".Properties::getProperty(): no such key "/ExonModel/maxexonlength".Properties::getProperty(): no such key "checkExAcc".Properties::getProperty(): no such key "/ExonModel/verbosity".Properties::getProperty(): no such key "/ExonModel/k".Properties::getProperty(): no such key "/ExonModel/patpseudocount".Properties::getProperty(): no such key "/ExonModel/exonlengthD".Properties::getProperty(): no such key "/ExonModel/slope_of_bandwidth".Properties::getProperty(): no such key "/ExonModel/minwindowcount".Properties::getProperty(): no such key "/ExonModel/minPatSum".Properties::getProperty(): no such key "/ExonModel/etorder".Properties::getProperty(): no such key "/ExonModel/etpseudocount".Properties::getProperty(): no such key "/ExonModel/tis_motif_memory".Properties::getProperty(): no such key "/ExonModel/tis_motif_radius".Properties::getProperty(): no such key "/IntronModel/slope_of_bandwidth".Properties::getProperty(): no such key "/IntronModel/minwindowcount".Properties::getProperty(): no such key "/IntronModel/asspseudocount".Properties::getProperty(): no such key "/IntronModel/dsspseudocount".Properties::getProperty(): no such key "/IntronModel/dssneighborfactor".Properties::getProperty(): no such key "/IntronModel/d".Properties::getProperty(): no such key "/IntronModel/ass_motif_memory".Properties::getProperty(): no such key "/IntronModel/ass_motif_radius".couldn't read the '/BaseCount/weighingType' property

/opt/Augustus/bin//augustus: ERROR Properties::getProperty(): no such key "/ExonModel/infile".