Gaius-Augustus / GALBA

GALBA is a pipeline for fully automated prediction of protein coding gene structures with AUGUSTUS in novel eukaryotic genomes for the scenario where high quality proteins from one or several closely related species are available.
Other
121 stars 4 forks source link

Pygustus in the Singularity #11

Closed Phismil closed 1 year ago

Phismil commented 1 year ago

Dear Katharina, Thank you for maintaining the repository. I am facing an issue with Singularity that stretches my limited knowledge to its borders, and I am wondering whether it could be related to the latest changes in the container. I have embedded the error below. Thank you in advance

singularity exec -H XXX:/home -B XXX /apps/chpc/GALBA/galba.sif galba.pl --genome=XXX/genome.fa --prot_seq=XXX/proteins.fa --threads 54 --species T1

File "/XXX/Annotation/GALBA/pygustus_hints.py", line 5, in augustus.config_set_bin('/usr/bin//augustus') File "/opt/conda/lib/python3.10/site-packages/pygustus/augustus.py", line 122, in config_set_bin util.set_config_item('augustus_bin', value) File "/opt/conda/lib/python3.10/site-packages/pygustus/util.py", line 142, in set_config_item config_file = set_json_file() File "/opt/conda/lib/python3.10/site-packages/pygustus/util.py", line 223, in set_json_file for file_name in os.listdir(pygustus_cfg_dir): FileNotFoundError: [Errno 2] No such file or directory: '/home/.pygustus'

KatharinaHoff commented 1 year ago

Thank you for reporting this.

Pygustus (for design reasons that are a bit mysterious to myself) writes a config-file with the etraining and augustus binaries. Originally, this was written in the site-packages folder. However, that folder is not writable when Pygustus is executed in Singularity. Therefore, I changed Pygustus to create a new folder .pygustus in the user's home directory. I am using the following code in Pygustus to find you home directory:

homedir = os.path.expanduser('~') # util.py line 214

Apparently, that does not return your home directory. What operating system are you using?

KatharinaHoff commented 1 year ago

I now see that you are starting the container with claiming that your home is in /home. However, your writable user home should not be that. It should be /home/yourusername, e.g. if your user name is peter, your home directory should be /home/peter. It is essential that you have write permissions in the home folder that you specify.

If you start the container with a writable home location, it should work fine:

singularity exec -H XXX:/home/peter -B XXX /apps/chpc/GALBA/galba.sif galba.pl --genome=XXX/genome.fa --prot_seq=XXX/proteins.fa --threads 54 --species T1

(You will have to replace peter by your actual user name.)

Phismil commented 1 year ago

Thank you, Katharina, I changed it to the user's home, and it seems fine. The HPC "home/user" directory has limited space, and this is why I had to redirect it to our /mnt/ folder. It solved the issue. However, Singularity now complains about an invalid argument in the pygustus_hints.py. Do you have any hint of what could have gone wrong? I embedded the error below.

line 9 in augustus.predict('/XXX/GALBA/genome.fa', species='T7', partitionLargeSeqeunces=True, partitionHints=True, minSplitSize=100 ... ValueError: Invalid Parameter for Augustus: partitionLargeSeqeunces

KatharinaHoff commented 1 year ago

The json file that will be written in ~/.pygustus is tiny. It's not going to fill your harddrive. Are you sure that you have the latest container? If not, please re-build the singularity image. If you are sure: is it possible that you have another AUGUSTUS_CONFIG_PATH lingering on your system?

The problem is this: seqeunces is obviously a spelling mistake. I fixed it in Pygustus and in Augustus. However, the fix in Augustus has not made it into an Augustus release. The Debian packages still contain the typo. Inside the GALBA container, I use an outdated Debian package, and therefore, I "patch" galba to use than typo. However, if there is any newer Augustus with the fix in an AUGUSTUS_CONFIG_PATH, then it might still cause trouble.

So first tip: re-build the container, pull latest.

Second: check your environment variables.

Phismil commented 1 year ago

Dear Katherina, I reinstalled the singularity on a google compute engine to make sure no Augustus was ever installed there. And I check the printenv. It still complains about the same issue. I tried the container on the example files.

Thank you in advance

singularity exec -B /home/xx/GALBA/ -H /home/xx/ galba.sif galba.pl --genome=./genome.fa --prot_seq=./proteins.fa --threads 24 --skipOptimize --species test

Thu Mar 9 19:36:41 2023: Log information is stored in file /xx/xx/GALBA/GALBA/GALBA.log [M::mp_idx_restore@0.0711.01] loaded the index [M::mp_idx_print_stat] 280767 distinct k-mers; mean occ of infrequent k-mers: 1.52; 0 frequent k-mers accounting for 0 occurrences [M::worker_pipeline::0.3085.23] mapped 259 sequences [M::main] Version: 0.7-r216-dirty [M::main] CMD: /opt/miniprot/miniprot -ut24 --outn=1 --gtf /xx/xx/GALBA/GALBA/genome.mpi /home/xx/GALBA/proteins.fa [M::main] Real time: 0.309 sec; CPU: 1.613 sec; Peak RSS: 0.095 GB [M::mp_idx_restore@0.0591.02] loaded the index [M::mp_idx_print_stat] 280767 distinct k-mers; mean occ of infrequent k-mers: 1.52; 0 frequent k-mers accounting for 0 occurrences [M::worker_pipeline::0.2835.52] mapped 259 sequences [M::main] Version: 0.7-r216-dirty [M::main] CMD: /opt/miniprot/miniprot -ut24 --outn=1 --aln /xx/xx/GALBA/GALBA/genome.mpi /home/xx/GALBA/proteins.fa [M::main] Real time: 0.284 sec; CPU: 1.562 sec; Peak RSS: 0.096 GB

*****

WARNING: Number of reliable training genes is low (226). Recommended are at least 600 genes

*****

ERROR in file /opt/GALBA/scripts/galba.pl at line 5208 Failed to execute: /opt/conda/bin/python3 /home/xx/GALBA/GALBA/pygustus_hints.py 1> /home/xx/GALBA/GALBA/pygustus_hints.out 2>/home/xx/GALBA/GALBA/errors/pygustus_hints.err

And

File "/opt/conda/lib/python3.10/site-packages/pygustus/options/aug_options.py", line 101, in set_value raise ValueError( ValueError: Invalid Parameter for Augustus: partitionLargeSeqeunces

KatharinaHoff commented 1 year ago

Thank you for checking. I will build you a different image and send you the link later today. We will make it work.

Phismil @.***> schrieb am Do. 9. März 2023 um 19:45:

Dear Katherina, I reinstalled the singularity on a google compute engine to make sure no Augustus was ever installed there. And I check the printenv. It still complains about the same issue. I tried the container on example files.

Thank you in advance

singularity exec -B /home/xx/GALBA/ -H /home/xx/ galba.sif galba.pl --genome=./genome.fa --prot_seq=./proteins.fa --threads 24 --skipOptimize --species test

Thu Mar 9 19:36:41 2023: Log information is stored in file /xx/xx/GALBA/GALBA/GALBA.log @.***

1.01] loaded the index [M::mp_idx_print_stat] 280767 distinct k-mers; mean occ of infrequent k-mers: 1.52; 0 frequent k-mers accounting for 0 occurrences [M::worker_pipeline::0.3085.23] mapped 259 sequences [M::main] Version: 0.7-r216-dirty [M::main] CMD: /opt/miniprot/miniprot -ut24 --outn=1 --gtf /xx/xx/GALBA/GALBA/genome.mpi /home/ekarsalan/GALBA/proteins.fa [M::main] Real time: 0.309 sec; CPU: 1.613 sec; Peak RSS: 0.095 GB @.***

1.02] loaded the index [M::mp_idx_print_stat] 280767 distinct k-mers; mean occ of infrequent k-mers: 1.52; 0 frequent k-mers accounting for 0 occurrences [M::worker_pipeline::0.2835.52] mapped 259 sequences [M::main] Version: 0.7-r216-dirty [M::main] CMD: /opt/miniprot/miniprot -ut24 --outn=1 --aln /xx/xx/GALBA/GALBA/genome.mpi /home/ekarsalan/GALBA/proteins.fa [M::main] Real time: 0.284 sec; CPU: 1.562 sec; Peak RSS: 0.096 GB

*****

WARNING: Number of reliable training genes is low (226). Recommended are at least 600 genes

*****

ERROR in file /opt/GALBA/scripts/galba.pl at line 5208 Failed to execute: /opt/conda/bin/python3 /home/xx/GALBA/GALBA/pygustus_hints.py 1> /home/xx/GALBA/GALBA/pygustus_hints.out 2>/home/xx/GALBA/GALBA/errors/pygustus_hints.err

And

File "/opt/conda/lib/python3.10/site-packages/pygustus/options/aug_options.py", line 101, in set_value raise ValueError( ValueError: Invalid Parameter for Augustus: partitionLargeSeqeunces

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/GALBA/issues/11#issuecomment-1462674703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JCM6T6H42SCZVV2USTW3IXNFANCNFSM6AAAAAAVPWD4CA . You are receiving this because you commented.Message ID: @.***>

KatharinaHoff commented 1 year ago

Please try to build a new image as follows:

singularity build galba.sif docker://katharinahoff/galba-notebook:pygustusbug

Please let me know whether it works.

KatharinaHoff commented 1 year ago

This issue has been fixed. I will push an updated container image later, today.