Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
334 stars 80 forks source link

Any suggestions for running an array job for Braker? #803

Open lw78943 opened 2 months ago

lw78943 commented 2 months ago

Dear Braker developers,

Thanks for sharing this great tool to the community. It's very useful and convenient!

Recently, I tried to do genome annotation for many genomes, with the latest version of Braker and Apptainer.

I submitted an array job to run multiple Braker jobs at the same time. Unfortunately, they were failed after RNAseq alignment, without obvious error message. After checking the log file, I guess it was because AUGUSTUS_CONFIG_PATH, where all of the jobs set the AUGUSTUS_CONFIG_PATH=/home/User/.augustus. All of the jobs trying to write something into the same folder, which probably interrupted all of the jobs. After that, I have tried to set AUGUSTUS_CONFIG_PATH to working directory or home directory, but Braker will still set $AUGUSTUS_CONFIG_PATH=/home/User/.augustus.

Do you have any suggestions for me to run an array job for Braker? Thank you so much for helping and supporting me.

Have a great day! Li

KatharinaHoff commented 2 months ago

Did you provide an explicit non redundant species name for each task in the array?

lw78943 @.***> schrieb am Mo. 15. Apr. 2024 um 00:20:

Dear Braker developers,

Thanks for sharing this great tool to the community. It's very useful and convenient!

Recently, I tried to do genome annotation for many genomes, with the latest version of Braker and Apptainer.

I submitted an array job to run multiple Braker jobs at the same time. Unfortunately, they were failed after RNAseq alignment, without obvious error message. After checking the log file, I guess it was because AUGUSTUS_CONFIG_PATH, where all of the jobs set the AUGUSTUS_CONFIG_PATH=/home/User/.augustus. All of the jobs trying to write something into the same folder, which probably interrupted all of the jobs. After that, I have tried to set AUGUSTUS_CONFIG_PATH to working directory or home directory, but Braker will still set $AUGUSTUS_CONFIG_PATH=/home/User/.augustus. Sun Apr 14 17:05:20 2024: Found command line argument $AUGUSTUS_CONFIG_PATH. Sun Apr 14 17:05:20 2024: Checking /home/User/.test as potential path for $AUGUSTUS_CONFIG_PATH. Sun Apr 14 17:05:20 2024: Success! Setting $AUGUSTUS_CONFIG_PATH to /home/User/.test! Sun Apr 14 17:05:20 2024: WARNING: in file /opt/BRAKER/scripts/braker.pl at line 1931

AUGUSTUS_CONFIG_PATH/species (in this case /home/li.wang/.test/species) is not writeable. BRAKER will try to copy the AUGUSTUS config directory to a writeable location. Sun Apr 14 17:05:20 2024: Trying to set $AUGUSTUS_BIN_PATH... Sun Apr 14 17:05:20 2024: Found environment variable $AUGUSTUS_BIN_PATH. Sun Apr 14 17:05:20 2024: Checking /opt/Augustus/bin/ as potential path for $AUGUSTUS_BIN_PATH. Sun Apr 14 17:05:20 2024: Success! Setting $AUGUSTUS_BIN_PATH to /opt/Augustus/bin/! Sun Apr 14 17:05:20 2024: Trying to set $AUGUSTUS_SCRIPTS_PATH... Sun Apr 14 17:05:20 2024: Found environment variable $AUGUSTUS_SCRIPTS_PATH. Sun Apr 14 17:05:20 2024: Checking /opt/Augustus/scripts/ as potential path for $AUGUSTUS_SCRIPTS_PATH. Sun Apr 14 17:05:20 2024: Success! Setting $AUGUSTUS_SCRIPTS_PATH to /opt/Augustus/scripts/! Sun Apr 14 17:05:20 2024: WARNING: BRAKER will copy the

AUGUSTUS_CONFIG folder into your home directory! Sun Apr 14 17:05:20 2024: WARNING: $AUGUSTUS_CONFIG_PATH/species (in this case /home/User/.test/species ) is not writeable.

*** IMPORTANT: Resetting $AUGUSTUS_CONFIG_PATH=/home/User/.augustus because BRAKER requires a writable location!

Do you have any suggestions for me to run an array job for Braker? Thank you so much for helping and supporting me.

Have a great day! Li

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/803, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JA7EGWTHDF5SEQ7EKDY5L6KFAVCNFSM6AAAAABGGK4V62VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2DEMZZG4YDKNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

KatharinaHoff commented 2 months ago

In any case, I recommend running not too many of these jobs in parallel if they are reading/writing a centrally mounted disk. I/o turns into a bottleneck due to data parallelizing.

Katharina Hoff @.***> schrieb am Mo. 15. Apr. 2024 um 08:12:

Did you provide an explicit non redundant species name for each task in the array?

lw78943 @.***> schrieb am Mo. 15. Apr. 2024 um 00:20:

Dear Braker developers,

Thanks for sharing this great tool to the community. It's very useful and convenient!

Recently, I tried to do genome annotation for many genomes, with the latest version of Braker and Apptainer.

I submitted an array job to run multiple Braker jobs at the same time. Unfortunately, they were failed after RNAseq alignment, without obvious error message. After checking the log file, I guess it was because AUGUSTUS_CONFIG_PATH, where all of the jobs set the AUGUSTUS_CONFIG_PATH=/home/User/.augustus. All of the jobs trying to write something into the same folder, which probably interrupted all of the jobs. After that, I have tried to set AUGUSTUS_CONFIG_PATH to working directory or home directory, but Braker will still set $AUGUSTUS_CONFIG_PATH=/home/User/.augustus. Sun Apr 14 17:05:20 2024: Found command line argument $AUGUSTUS_CONFIG_PATH. Sun Apr 14 17:05:20 2024: Checking /home/User/.test as potential path for $AUGUSTUS_CONFIG_PATH. Sun Apr 14 17:05:20 2024: Success! Setting $AUGUSTUS_CONFIG_PATH to /home/User/.test! Sun Apr 14 17:05:20 2024: WARNING: in file /opt/BRAKER/scripts/braker.pl at line 1931

AUGUSTUS_CONFIG_PATH/species (in this case /home/li.wang/.test/species) is not writeable. BRAKER will try to copy the AUGUSTUS config directory to a writeable location. Sun Apr 14 17:05:20 2024: Trying to set $AUGUSTUS_BIN_PATH... Sun Apr 14 17:05:20 2024: Found environment variable $AUGUSTUS_BIN_PATH. Sun Apr 14 17:05:20 2024: Checking /opt/Augustus/bin/ as potential path for $AUGUSTUS_BIN_PATH. Sun Apr 14 17:05:20 2024: Success! Setting $AUGUSTUS_BIN_PATH to /opt/Augustus/bin/! Sun Apr 14 17:05:20 2024: Trying to set $AUGUSTUS_SCRIPTS_PATH... Sun Apr 14 17:05:20 2024: Found environment variable $AUGUSTUS_SCRIPTS_PATH. Sun Apr 14 17:05:20 2024: Checking /opt/Augustus/scripts/ as potential path for $AUGUSTUS_SCRIPTS_PATH. Sun Apr 14 17:05:20 2024: Success! Setting $AUGUSTUS_SCRIPTS_PATH to /opt/Augustus/scripts/! Sun Apr 14 17:05:20 2024: WARNING: BRAKER will copy the

AUGUSTUS_CONFIG folder into your home directory! Sun Apr 14 17:05:20 2024: WARNING: $AUGUSTUS_CONFIG_PATH/species (in this case /home/User/.test/species ) is not writeable.

*** IMPORTANT: Resetting $AUGUSTUS_CONFIG_PATH=/home/User/.augustus because BRAKER requires a writable location!

Do you have any suggestions for me to run an array job for Braker? Thank you so much for helping and supporting me.

Have a great day! Li

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/803, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JA7EGWTHDF5SEQ7EKDY5L6KFAVCNFSM6AAAAABGGK4V62VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2DEMZZG4YDKNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lw78943 commented 2 months ago

Hi KatharinaHoff,

Thanks for the information. I have tried to run 15 jobs at the same time with non-redundant species name for each task. However, they were all failed. One possible reason is that I only have 10 Gb space for my home directory. Is it possible to set $AUGUSTUS_CONFIG_PATH to scratch/project directory, where we have lots of space? I hope to do genome annotation for 200 genomes. Therefore, it would be great if I can submit an array job. Thanks.

Bests,

Li

KatharinaHoff commented 2 months ago

The AUGUSTUS_CONFIG_FOLDER is not particularly big, this should not matter. Mine has 179 MB, and it contains a number of additional parameter sets that you do not have. (It is a problem to run in parallel without a specified species parameter name because BRAKER checks for the presence of Sp_INT with increasing integers, and if several jobs at the same time find that e.g. Sp_3 does not exist, yet, they all try to write Sp_3, and then they fail.)

The working directory contents get big. BRAKER runs data parallellization. We use a working dir in /tmp (local HDD) on our HPC nodes for computation because otherwise, we see i/o problems on the HPC.

KatharinaHoff commented 2 months ago

In your case, I would probably try to schedule a virtual session on an execution node and check the read/write permissions when running the container in the AUGUSTUS_CONFIG_PATH.

lw78943 commented 2 months ago

Dear KatharinaHoff,

Thank you so much for your support! It works with non-redundant species name for array jobs. I will close this ticket now. Appreciated.

Bests,

Li