Tropini-lab / PUPpy

Pipeline to design taxon-specific primes in defined bacterial communities.
https://journals.asm.org/doi/10.1128/msphere.00360-24
GNU General Public License v3.0
10 stars 4 forks source link

No such file or directory: ResultDB.tsv for test data #12

Closed MonicaSteffi closed 1 month ago

MonicaSteffi commented 1 month ago

Hi,

Thank you for the tool. I am trying to run PUPpy with test dataset before running with my original dataset. but I got the following error.

my command is:

puppy-align -pr PUPpy/test/INPUT_primerTarget/ -nt PUPpy/test/INPUT_nonTarget -o OUTPUT_puppy-align_test2


                 @    @ @ @                  @     @
              @       @                      @       @
           @         @                         @       @
        @          @                             @       @
     @            @                               @         @
   @            @                                  @           @
 @             @                                                 @
@             @       @@@@@@             @@@@@@     @             ,
             @        @@@@@@@           @@@@@@@     @
 @           @        @@@@@@   @@@@@@@    @@@@                   @
   @         @          @@     @@@@@@      @                   @
     @        @        @          @          @      @        @  @   @  @   @
        @      @      @                       @     @      @    @      @@  @@
             @@@                               @   @   @        @   @@  @     @@
      @@@@@@        @               @@@@@       @       @  @@   @@    @  @    @
      @@@@@@@@@                @@@@@@@@         @@@@/    ( @ @@@ @   @  @
        @@@@@@@@@@@@          @@@@@@@@@         @@@@@    @ @ @   @   @@  @
       @@@@@@@@              @        @        @       @ @ @@    @  @
       @@@@@@          @@ @@          (@ @@@@

ASCII art designed with manytools.org from puppy logo

2024-07-11 15:27:13,764 - INFO - Renaming FASTA headers of input CDS files
2024-07-11 15:27:13,901 - INFO - Aligning genes with MMseqs2. This might take a while.
createdb /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/concatenated_CDSes.fna /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/QueryDB

MMseqs Version:         15.6f452
Database type           0
Shuffle input database  true
Createdb mode           0
Write lookup file       1
Offset of numeric ids   0
Compressed              0
Verbosity               3

Converting sequences
[15555] 0s 116ms
Time for merging to QueryDB_h: 0h 0m 0s 57ms
Time for merging to QueryDB: 0h 0m 0s 136ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 363ms
createdb /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/concatenated_CDSes.fna /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/TargetDB

MMseqs Version:         15.6f452
Database type           0
Shuffle input database  true
Createdb mode           0
Write lookup file       1
Offset of numeric ids   0
Compressed              0
Verbosity               3

Converting sequences
[15555] 0s 105ms
Time for merging to TargetDB_h: 0h 0m 0s 69ms
Time for merging to TargetDB: 0h 0m 0s 131ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 358ms
Create directory /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/tmp
createindex /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/TargetDB /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/tmp --search-type 3

MMseqs Version:                 15.6f452
Seed substitution matrix        aa:VTML80.out,nucl:nucleotide.out
k-mer length                    0
Alphabet size                   aa:21,nucl:5
Compositional bias              1
Compositional bias              1
Max sequence length             65535
Max results per query           300
Mask residues                   1
Mask residues probability       0.9
Mask lower case residues        0
Spaced k-mers                   1
Spaced k-mer pattern
Sensitivity                     7.5
k-score                         seq:0,prof:0
Check compatible                0
Search type                     3
Split database                  0
Split memory limit              0
Index subset                    0
Verbosity                       3
Threads                         56
Min codons in orf               30
Max codons in length            32734
Max orf gaps                    2147483647
Contig start mode               2
Contig end mode                 2
Orf start mode                  1
Forward frames                  1,2,3
Reverse frames                  1,2,3
Translation table               1
Translate orf                   0
Use all table starts            false
Offset of numeric ids           0
Create lookup                   0
Compressed                      0
Add orf stop                    false
Overlap between sequences       0
Sequence split mode             1
Header split mode               0
Strand selection                1
Remove temporary files          false

createindex /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/TargetDB /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/tmp --search-type 3

MMseqs Version:                 15.6f452
Seed substitution matrix        aa:VTML80.out,nucl:nucleotide.out
k-mer length                    15
Alphabet size                   aa:21,nucl:5
Compositional bias              1
Compositional bias              1
Max sequence length             10000
Max results per query           300
Mask residues                   1
Mask residues probability       0.9
Mask lower case residues        0
Spaced k-mers                   1
Spaced k-mer pattern
Sensitivity                     7.5
k-score                         seq:0,prof:0
Check compatible                0
Search type                     3
Split database                  0
Split memory limit              0
Index subset                    0
Verbosity                       3
Threads                         56
Min codons in orf               30
Max codons in length            32734
Max orf gaps                    2147483647
Contig start mode               2
Contig end mode                 2
Orf start mode                  1
Forward frames                  1
Reverse frames
Translation table               1
Translate orf                   0
Use all table starts            false
Offset of numeric ids           0
Create lookup                   0
Compressed                      0
Add orf stop                    false
Overlap between sequences       0
Sequence split mode             1
Header split mode               0
Strand selection                1
Remove temporary files          false

Failed to execute /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/tmp/18077789893369462219/createindex.sh with error 13.
Traceback (most recent call last):
  File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/puppy/lib/python3.10/shutil.py", line 816, in move
    os.rename(src, real_dst)
FileNotFoundError: [Errno 2] No such file or directory: '/dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/tmp/ResultDB.tsv' -> '/dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/ResultDB.tsv'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/puppy/bin/puppy-align", line 390, in <module>
    success = run_mmseqs2(output, cds_intended, cds_unintended, args.identity, args.coverage, args.covMode, args.min_aln_len)
  File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/puppy/bin/puppy-align", line 316, in run_mmseqs2
    shutil.move(f"{output}/tmp/ResultDB.tsv", output)
  File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/puppy/lib/python3.10/shutil.py", line 836, in move
    copy_function(src, real_dst)
  File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/puppy/lib/python3.10/shutil.py", line 434, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/dss/dsshome1/0F/ra78zut/miniconda3/envs/puppy/lib/python3.10/shutil.py", line 254, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/PUPy/Align_OUT/tmp/ResultDB.tsv'

Any help would be appreciated

hghezzi commented 1 month ago

Hi!

Thank you for providing detailed messages regarding this error.

It seems like the script createindex.sh, run by MMseqs2, failed to execute. I am wondering if this step is requiring more memory than available on your computer. Would you be able to free some memory or test the pipeline on another machine (someone else's computer or a cluster)?

Let me know if you are still encountering this issue :)

MonicaSteffi commented 1 month ago

Sorry for the late and thank you for reply. I was running via slurm cluster but the programme failed stating that Error: indexdb died

My slurm commands:

#!/bin/bash
#SBATCH --output=output_puby.out       # Standard output and error log
#SBATCH --error=error_puby.out            # Standard output and error log
#SBATCH --job-name=PUPY                 # Job name
#SBATCH --get-user-env
#SBATCH --clusters=inter
#SBATCH --partition=cm2_inter
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=10
#SBATCH --time=2:00:00
#SBATCH --mail-type=END                         ## Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=matchado@mvp.lmu.de     ## Where to send mail
#SBATCH --export=NONE
#SBATCH --mem=10000

source activate puppy
puppy-align -pr PUPpy/test/INPUT_primerTarget/ -nt PUPpy/test/INPUT_nonTarget -o /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2
cat output_puby.out

                 @    @ @ @                  @     @
              @       @                      @       @
           @         @                         @       @
        @          @                             @       @
     @            @                               @         @
   @            @                                  @           @
 @             @                                                 @
@             @       @@@@@@             @@@@@@     @             ,
             @        @@@@@@@           @@@@@@@     @
 @           @        @@@@@@   @@@@@@@    @@@@                   @
   @         @          @@     @@@@@@      @                   @
     @        @        @          @          @      @        @  @   @  @   @
        @      @      @                       @     @      @    @      @@  @@
             @@@                               @   @   @        @   @@  @     @@
      @@@@@@        @               @@@@@       @       @  @@   @@    @  @    @
      @@@@@@@@@                @@@@@@@@         @@@@/    ( @ @@@ @   @  @
        @@@@@@@@@@@@          @@@@@@@@@         @@@@@    @ @ @   @   @@  @
       @@@@@@@@              @        @        @       @ @ @@    @  @
       @@@@@@          @@ @@          (@ @@@@

ASCII art designed with manytools.org from puppy logo

2024-07-16 09:02:49,293 - INFO - Renaming FASTA headers of input CDS files
2024-07-16 09:02:49,600 - INFO - Aligning genes with MMseqs2. This might take a while.
createdb /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/concatenated_CDSes.fna /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/QueryDB

MMseqs Version:         15.6f452
Database type           0
Shuffle input database  true
Createdb mode           0
Write lookup file       1
Offset of numeric ids   0
Compressed              0
Verbosity               3

Converting sequences
[=
Time for merging to QueryDB_h: 0h 0m 0s 48ms
Time for merging to QueryDB: 0h 0m 0s 375ms
Database type: Nucleotide
Time for processing: 0h 0m 1s 227ms
createdb /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/concatenated_CDSes.fna /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/TargetDB

MMseqs Version:         15.6f452
Database type           0
Shuffle input database  true
Createdb mode           0
Write lookup file       1
Offset of numeric ids   0
Compressed              0
Verbosity               3

Converting sequences
[=
Time for merging to TargetDB_h: 0h 0m 0s 31ms
Time for merging to TargetDB: 0h 0m 0s 375ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 588ms
Create directory /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/tmp
createindex /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/TargetDB /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/tmp --search-type 3

MMseqs Version:                 15.6f452
Seed substitution matrix        aa:VTML80.out,nucl:nucleotide.out
k-mer length                    0
Alphabet size                   aa:21,nucl:5
Compositional bias              1
Compositional bias              1
Max sequence length             65535
Max results per query           300
Mask residues                   1
Mask residues probability       0.9
Mask lower case residues        0
Spaced k-mers                   1
Spaced k-mer pattern
Sensitivity                     7.5
k-score                         seq:0,prof:0
Check compatible                0
Search type                     3
Split database                  0
Split memory limit              0
Index subset                    0
Verbosity                       3
Threads                         56
Min codons in orf               30
Max codons in length            32734
Max orf gaps                    2147483647
Contig start mode               2
Contig end mode                 2
Orf start mode                  1
Forward frames                  1,2,3
Reverse frames                  1,2,3
Translation table               1
Translate orf                   0
Use all table starts            false
Offset of numeric ids           0
Create lookup                   0
Compressed                      0
Add orf stop                    false
Overlap between sequences       0
Sequence split mode             1
Header split mode               0
Strand selection                1
Remove temporary files          false

createindex /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/TargetDB /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/tmp --search-type 3

MMseqs Version:                 15.6f452
Seed substitution matrix        aa:VTML80.out,nucl:nucleotide.out
k-mer length                    15
Alphabet size                   aa:21,nucl:5
Compositional bias              1
Compositional bias              1
Max sequence length             10000
Max results per query           300
Mask residues                   1
Mask residues probability       0.9
Mask lower case residues        0
Spaced k-mers                   1
Spaced k-mer pattern
Sensitivity                     7.5
k-score                         seq:0,prof:0
Check compatible                0
Search type                     3
Split database                  0
Split memory limit              0
Index subset                    0
Verbosity                       3
Threads                         56
Min codons in orf               30
Max codons in length            32734
Max orf gaps                    2147483647
Contig start mode               2
Contig end mode                 2
Orf start mode                  1
Forward frames                  1
Reverse frames
Translation table               1
Translate orf                   0
Use all table starts            false
Offset of numeric ids           0
Create lookup                   0
Compressed                      0
Add orf stop                    false
Overlap between sequences       0
Sequence split mode             1
Header split mode               0
Strand selection                1
Remove temporary files          false

splitsequence /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/TargetDB /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/tmp/6960269002712045218/nucl_split_seq --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 56 --compressed 0 -v 3

[=================================================================] 15.64K 0s 19ms
Time for merging to nucl_split_seq_h: 0h 0m 0s 51ms
Time for merging to nucl_split_seq: 0h 0m 0s 29ms
Time for processing: 0h 0m 0s 910ms
indexdb /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/tmp/6960269002712045218/nucl_split_seq /dss/dsshome1/0F/ra78zut/OUTPUT_puppy-align_test2/TargetDB --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 15 --alph-size aa:21,nucl:5 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-seq-len 10000 --max-seqs 300 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score seq:0,prof:0 --check-compatible 0 --search-type 3 --split 0 --split-memory-limit 0 --index-subset 0 -v 3 --threads 56

Estimated memory consumption: 8G
Write VERSION (0)
Write META (1)
Write SCOREMATRIXNAME (2)
Write SPACEDPATTERN (23)
Write GENERATOR (22)
Write DBR1INDEX (5)
Write DBR1DATA (6)
Write DBR2INDEX (7)
Write DBR2DATA (8)
Write HDR1INDEX (18)
Write HDR1DATA (19)
Write HDR2INDEX (20)
Write HDR2DATA (21)
Write SCOREMATRIX3MER (4)
Write SCOREMATRIX2MER (3)
Index table: counting k-mers
[=================================================================] 15.64K 0s 217ms
Index table: Masked residues: 37396
Index table: fill
[=================================================================] 15.64K 0s 192ms
Index statistics
Entries:          16958730
DB size:          8289 MB
Avg k-mer size:   0.015794
Top 10 k-mers
    GGTCCGTGTGAGGGT     21
    ATGGCATGAAGACCC     17
    CATTGCTGTTGCGAA     16
    TGGGAACATCGAAGA     16
    CCATCCAACGGACGA     16
    TTTAGATACTAATGA     16
    TGAGCTGGTTATGAC     16
    ATGGCATGGAGACCC     16
    ACCGATACTCGACTC     16
    GAACGGAATTAGGTC     16
Write ENTRIES (9)
Write ENTRIESOFFSETS (10)
Error: indexdb died
hghezzi commented 1 month ago

Could you try increasing the memory requirements in your job from #SBATCH --mem=10000 to at least 16GB? Maybe try #SBATCH --mem=24000 to be safe.

MonicaSteffi commented 1 month ago

Thank you, It worked.

MonicaSteffi commented 1 month ago

Hi @hghezzi ,

I am not sure whether I can convey my problem very well here.

I have installed miniconda at "/dss/dss01/steff" path and So far I can only get output if I give the path for output directory as #"/dss/dss01/steff/OUTPUT. Unfortunately, I don't have enough space in dss/dss01/steff/.

If I tried to give different output folder like /dss/dss02/working_directory, PUPpy gave the same error message as No such file or directory: ResultDB.tsv as before.

I think this may be associated with mmseq problem. Kindly check the following github thread: https://github.com/apcamargo/genomad/issues/11#issuecomment-1442087309. https://github.com/soedinglab/MMseqs2/issues/534#issue-1137645243

Is there anything I can do it myself to solve this problem? Is it possible to save the tmp folder in different directory ?

Any help would be appreciated.

hghezzi commented 1 month ago

Hi @MonicaSteffi

What code are you running now to have the original error compared to when it worked before?

I would first ensure you are providing enough memory AND you have enough storage in the output folder. If this doesn't fix it, currently the only way to save the tmp folder in a different location is by changing the output directory with the -o flag.

Did you check that the folder you are running the scripts in has read, write, and execute permissions?

MonicaSteffi commented 1 month ago

What code are you running now to have the original error compared to when it worked before?

It is the same as before. I just change the output folder. If I change the output folder to different path, I am getting the error. Unfortunately, the output folder which I get the results before doesn't have enough space. I increased the memory up to #SBATCH --mem=100000. I also change the permission rights of new directory.

hghezzi commented 1 month ago

Hi @MonicaSteffi

It really just sounds like a memory (storage) issue regarding the output folder of puppy-align. The MMseqs tmp folder, and related files, can be quite large reaching several GBs. I would recommend freeing as much space as possible when running puppy-align. I don't believe it's an issue of RAM at this point, as even 24GB should be more than enough for the test data.