Remote providers have been replaced

NBISweden / aMeta

Ancient microbiome snakemake workflow

MIT License

19 stars 14 forks source link

Remote providers have been replaced #168

Open LeandroRitter opened 3 months ago

LeandroRitter commented 3 months ago

@percyfal looks like something has happened to the latest snakemake release. Currently the aMeta testrun fails with strange error, it was not the case two weeks ago:

(aMeta) nikolay@dell:~/WABI/A_Gotherstrom/aMeta/.test$ ./runtest.sh -j 4
This looks like the first test run... Installing bioconda packages...
NotImplementedError in file /home/nikolay/WABI/A_Gotherstrom/aMeta/workflow/rules/common.smk, line 12:
Remote providers have been replaced by Snakemake storage plugins. Please use the corresponding storage plugin instead (snakemake-storage-plugin-*).
Building krakenuniq data
grep: .snakemake/conda/*yaml: No such file or directory
./runtest.sh: line 30: krakenuniq-build: command not found
Building krona taxonomy
grep: .snakemake/conda/*yaml: No such file or directory
./runtest.sh: line 39: cd: /opt/krona: No such file or directory
./runtest.sh: line 40: ./updateTaxonomy.sh: No such file or directory
/home/nikolay/WABI/A_Gotherstrom/aMeta
Adjusting malt max memory usage
grep: .snakemake/conda/*yaml: No such file or directory
./runtest.sh: line 51: cd: /opt/malt-: No such file or directory
sed: can't read malt-build.vmoptions: No such file or directory
sed: can't read malt-run.vmoptions: No such file or directory
/home/nikolay/WABI/A_Gotherstrom/aMeta/.test
Running workflow...
snakemake --conda-frontend conda --use-conda --show-failed-logs --conda-cleanup-pkgs cache -s ../workflow/Snakefile -j 4
NotImplementedError in file /home/nikolay/WABI/A_Gotherstrom/aMeta/workflow/rules/common.smk, line 12:
Remote providers have been replaced by Snakemake storage plugins. Please use the corresponding storage plugin instead (snakemake-storage-plugin-*).
ERROR: Workflow test failed!

JediKnightChan commented 3 months ago

Does it affect only test (and I can proceed with the real pipeline usage if I have this error) or the whole pipeline? Are there any solutions like using older versions of conda/your pipeline?

LeandroRitter commented 3 months ago

@JediKnightChan I am afraid it affects the whole pipeline. For now, just specifying an older snakemake version 6.3.0, i.e. adding this line

- snakemake=6.3.0

to the aMeta/workflow/ens/environment.yaml should be a quick fix, but we will probably have to modify some aMeta rules to make it compatible with later versions of snakemake @percyfal

JediKnightChan commented 3 months ago

@JediKnightChan I am afraid it affects the whole pipeline. For now, just specifying an older snakemake version 6.3.0, i.e. adding this line

- snakemake=6.3.0

to the aMeta/workflow/ens/environment.yaml should be a quick fix, but we will probably have to modify some aMeta rules to make it compatible with later versions of snakemake @percyfal

I replaced snakemake-minimal>=5.18 with snakemake=6.3.0 and the errors above disappeared, however, I faced issue in jobid 16:

krakenuniq: database ("resources/KrakenUniq_DB") does not contain necessary file database.kdb

Is it related to this fix of the issue above or should I open another issue?

UPD: I guess I have to download databases before doing test, not after, will try that

LeandroRitter commented 3 months ago

@JediKnightChanhttps://github.com/JediKnightChan before implementing this fix you will need to reinstall aMeta by running:

rm -rf aMeta conda remove -n aMeta - - all

From the error you are reporting it looks like you did not clean after your failed aMeta installation.

Sent from my iPhone

On 11 Jun 2024, at 14:31, JediKnightChan @.***> wrote:

@JediKnightChanhttps://github.com/JediKnightChan I am afraid it affects the whole pipeline. For now, just specifying an older snakemake version 6.3.0, i.e. adding this line

snakemake=6.3.0

to the aMeta/workflow/ens/environment.yaml should be a quick fix, but we will probably have to modify some aMeta rules to make it compatible with later versions of snakemake @percyfalhttps://github.com/percyfal

I replaced snakemake-minimal>=5.18 with snakemake=6.3.0 and the errors above disappeared, however, I faced issue in jobid 16:

krakenuniq: database ("resources/KrakenUniq_DB") does not contain necessary file database.kdb

Is it related to this fix of the issue above or should I open another issue?

— Reply to this email directly, view it on GitHubhttps://github.com/NBISweden/aMeta/issues/168#issuecomment-2160642065, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFJNB4XDK6TB2FIJSVFY3ILZG3UYNAVCNFSM6AAAAABJEAPNU6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGY2DEMBWGU. You are receiving this because you authored the thread.Message ID: @.***>

LeandroRitter commented 3 months ago

@JediKnightChanhttps://github.com/JediKnightChan for running test you do not need to download the databases (this was not tge cause for the error), they will be build on the fly for this toy dataset. However for future production runs on real data, you need the databases that we provide together with aMeta.

Sent from my iPhone

On 11 Jun 2024, at 14:31, JediKnightChan @.***> wrote:

@JediKnightChanhttps://github.com/JediKnightChan I am afraid it affects the whole pipeline. For now, just specifying an older snakemake version 6.3.0, i.e. adding this line

snakemake=6.3.0

I replaced snakemake-minimal>=5.18 with snakemake=6.3.0 and the errors above disappeared, however, I faced issue in jobid 16:

krakenuniq: database ("resources/KrakenUniq_DB") does not contain necessary file database.kdb

Is it related to this fix of the issue above or should I open another issue?

JediKnightChan commented 3 months ago

Thanks for claryfing about datasets. Now I tried repeating all the steps including fix for snakemake after

rm -rf aMeta
conda remove -n aMeta -- all

However, test still fails, now at job 18:

Error in rule KrakenUniq2Krona:
    jobid: 18
    output: results/KRAKENUNIQ/bar/krakenuniq.output.filtered_taxIDs_kmers1000.txt, results/KRAKENUNIQ/bar/sequences.krakenuniq_kmers1000.txt, results/KRAKENUNIQ/bar/sequences.krakenuniq_kmers1000.krona, results/KRAKENUNIQ/bar/taxonomy.krona.html
    log: logs/KRAKENUNIQ2KRONA/bar.log (check log file(s) for error message)
    conda-env: /home/pchela/aMeta/.test/.snakemake/conda/ddc76ad68721b9ac497e790beee82d14
    shell:
        /home/pchela/aMeta/workflow/scripts/krakenuniq2krona.py results/KRAKENUNIQ/bar/krakenuniq.output.filtered results/KRAKENUNIQ/bar/sequences.krakenuniq &> logs/KRAKENUNIQ2KRONA/bar.log; cat results/KRAKENUNIQ/bar/sequences.krakenuniq_kmers1000.txt | cut -f 2,3 > results/KRAKENUNIQ/bar/sequences.krakenuniq_kmers1000.krona; ktImportTaxonomy results/KRAKENUNIQ/bar/sequences.krakenuniq_kmers1000.krona -o results/KRAKENUNIQ/bar/taxonomy.krona.html  &>> logs/KRAKENUNIQ2KRONA/bar.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile logs/KRAKENUNIQ2KRONA/bar.log:
Sequence data set dimensions after selecting reads corresponding to filtered KrakenUniq output: 512
Taxonomy not found in /home/pchela/aMeta/.test/.snakemake/conda/ddc76ad68721b9ac497e790beee82d14/opt/krona/taxonomy. Was updateTaxonomy.sh run? at /home/pchela/aMeta/.test/.snakemake/conda/ddc76ad68721b9ac497e790beee82d14/opt/krona/scripts/../lib/KronaTools.pm line 1540.
Loading taxonomy...

LeandroRitter commented 3 months ago

@JediKnightChan could you please double check that you have enough RAM (at least 10 GB) and disk space (a couple of GB). Also could you post or attach the the logs/KRAKENUNIQ2KRONA/bar.log file here?

JediKnightChan commented 3 months ago

@JediKnightChan could you please double check that you have enough RAM (at least 10 GB) and disk space (a couple of GB). Also could you post or attach the the logs/KRAKENUNIQ2KRONA/bar.log file here?

Thanks, the RAM really was the issue, though I think I had about 16 GB then, now I switched to 400 GB machine and after that the test run smoothly and succeeded.

However, when I started running pipeline on real data after downloading needed files from links, I faced the following errror:

Activating conda environment: /home/pchela/aMeta/.snakemake/conda/8c061af2ff5608aa242db3b3e339abbe
[Thu Jun 13 19:03:04 2024]
Finished job 38.
75 of 156 steps (48%) done
[Thu Jun 13 19:21:57 2024]
Finished job 10.
76 of 156 steps (49%) done
Traceback (most recent call last):
  File "/home/pchela/aMeta/.snakemake/scripts/tmpajkls27q.malt-build.py", line 37, in <module>
    shell(
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/site-packages/snakemake/shell.py", line 231, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  grep -wFf results/KRAKENUNIQ_ABUNDANCE_MATRIX/unique_species_taxid_list.txt /home/pchela/malt/nucl_gb.accession2taxid > results/MALT_DB/seqid2taxid.project.map; cut -f1 results/MALT_DB/seqid2taxid.project.map > results/MALT_DB/seqids.project; grep -Ff results/MALT_DB/seqids.project /home/pchela/malt/library.fna.gz | sed 's/>//g' > results/MALT_DB/project.headers; seqtk subseq /home/pchela/malt/library.fna.gz results/MALT_DB/project.headers > results/MALT_DB/library.project.fna  2>> logs/BUILD_MALT_DB/BUILD_MALT_DB.log; unset DISPLAY; malt-build -i results/MALT_DB/library.project.fna -a2t /home/pchela/malt/nucl_gb.accession2taxid -s DNA -t 20 -d results/MALT_DB/maltDB.dat  2>> logs/BUILD_MALT_DB/BUILD_MALT_DB.log' returned non-zero exit status 1.
[Thu Jun 13 19:31:48 2024]
Error in rule Build_Malt_DB:
    jobid: 14
    output: results/MALT_DB/seqid2taxid.project.map, results/MALT_DB/seqids.project, results/MALT_DB/project.headers, results/MALT_DB/library.project.fna, results/MALT_DB/maltDB.dat
    log: logs/BUILD_MALT_DB/BUILD_MALT_DB.log (check log file(s) for error message)
    conda-env: /home/pchela/aMeta/.snakemake/conda/8c061af2ff5608aa242db3b3e339abbe

RuleException:
CalledProcessError in line 26 of /home/pchela/aMeta/workflow/rules/malt.smk:
Command 'source /home/pchela/miniforge3/envs/aMeta/bin/activate '/home/pchela/aMeta/.snakemake/conda/8c061af2ff5608aa242db3b3e339abbe'; set -euo pipefail;  python /home/pchela/aMeta/.snakemake/scripts/tmpajkls27q.malt-build.py' returned non-zero exit status 1.
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2274, in run_wrapper
  File "/home/pchela/aMeta/workflow/rules/malt.smk", line 26, in __rule_Build_Malt_DB
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 569, in _callback
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/concurrent/futures/thread.py", line 58, in run
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 555, in cached_or_run
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2362, in run_wrapper
Removing output files of failed job Build_Malt_DB since they might be corrupted:
results/MALT_DB/seqid2taxid.project.map, results/MALT_DB/seqids.project, results/MALT_DB/project.headers, results/MALT_DB/library.project.fna, results/MALT_DB/maltDB.dat

^C

Terminating processes on user request, this might take some time.
[Thu Jun 13 22:11:08 2024]
Error in rule Bowtie2_Index:
    jobid: 6
    output: /home/pchela/bt2/library.pathogen.fna.1.bt2l, /home/pchela/bt2/library.pathogen.fna.2.bt2l, /home/pchela/bt2/library.pathogen.fna.3.bt2l, /home/pchela/bt2/library.pathogen.fna.4.bt2l, /home/pchela/bt2/library.pathogen.fna.rev.1.bt2l, /home/pchela/bt2/library.pathogen.fna.rev.2.bt2l
    log: /home/pchela/bt2/library.pathogen.fna_BOWTIE2_BUILD.log (check log file(s) for error message)
    conda-env: /home/pchela/aMeta/.snakemake/conda/f4e84b943b849e4e26ea396ee4b88360
    shell:
        bowtie2-build --large-index --threads 1 /home/pchela/bt2/library.pathogen.fna /home/pchela/bt2/library.pathogen.fna > /home/pchela/bt2/library.pathogen.fna_BOWTIE2_BUILD.log 2>&1
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Complete log: /home/pchela/aMeta/.snakemake/log/2024-06-13T173655.937345.snakemake.log

The first error was in job 14, rule Build_Malt_DB, after that it said it wanted to delete the files, but then it hang and I interrupted it with CTRL+C after 3 hours, then it printed another error in job 6, rule Bowtie2_Index (I checked its log, it ends with keybord interrupt).

The log of build malt is:

Version   MALT (version 0.6.2, built 12 Sep 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 20.0.2; max memory: 512G
Classifications to use: Taxonomy
Reference sequence type set to: DNA
Seed shape(s): 111110111011110110111111
Number input files:            1
Loading FastA files:
10% 100% (0.0s)
Number of sequences:           0
Number of letters:             0
BUILDING table (0)...
Seeds found:              0
tableSize=                1
hashMask.length=1
maxHitsPerHash set to: 1000
Initializing arrays...
100% (0.0s)
Analysing seeds...
IllegalArgumentException: null

UPD: Do I need all other large files from malt database, eg library.fna.1.bt2l.gz, ... along with ones specified in config: library.fna.gz, seqid2taxid.map.orig, nucl_gb.accession2taxid?

LeandroRitter commented 3 months ago

@JediKnightChan No, you only need the files specified in the config https://github.com/NBISweden/aMeta?tab=readme-ov-file#quick-start.

Did you unzip the library.fna.gz file after downloading? This might be the reason for the Build_Malt_DB failure. Please double check it. Please note that all the files specified in these config lines

# Helping files for building Malt database
# can be downloaded from https://doi.org/10.17044/scilifelab.21070063
malt_nt_fasta: resources/library.fna
malt_seqid2taxid_db: resources/seqid2taxid.map.orig
malt_accession2taxid: resources/nucl_gb.accession2taxid

should be unzipped.

If the library.fna.gz was unzipped, then could you please send me your config.yaml file to nikolay.oskolkov@scilifelab.se?

JediKnightChan commented 3 months ago

Did you unzip the library.fna.gz file after downloading?

Indeed, it was the issue. After unzipping and specifying the .fna file in config, I had another error:

[Fri Jun 14 09:39:51 2024]                                                                                                                                                                                 
Job 14: Build_Malt_DB: BUILDING MALT DATABASE USING SPECIES DETECTED BY KRAKENUNIQ                                                                                                                         

[Fri Jun 14 09:39:51 2024]                                                                                                                                                                                 
rule Bowtie2_Index:                                                                                                                                                                                        
    input: /home/pchela/bt2/library.pathogen.fna                                                                                                                                                           
    output: /home/pchela/bt2/library.pathogen.fna.1.bt2l, /home/pchela/bt2/library.pathogen.fna.2.bt2l, /home/pchela/bt2/library.pathogen.fna.3.bt2l, /home/pchela/bt2/library.pathogen.fna.4.bt2l, /home/p
chela/bt2/library.pathogen.fna.rev.1.bt2l, /home/pchela/bt2/library.pathogen.fna.rev.2.bt2l                                                                                                                
    log: /home/pchela/bt2/library.pathogen.fna_BOWTIE2_BUILD.log                                                                                                                                           
    jobid: 6                                                                                                                                                                                               

Activating conda environment: /home/pchela/aMeta/.snakemake/conda/f4e84b943b849e4e26ea396ee4b88360                                                                                                         
Activating conda environment: /home/pchela/aMeta/.snakemake/conda/8c061af2ff5608aa242db3b3e339abbe                                                                                                         
Activating conda environment: /home/pchela/aMeta/.snakemake/conda/8c061af2ff5608aa242db3b3e339abbe

/usr/bin/bash: line 1: 49367 Killed                  malt-build -i results/MALT_DB/library.project.fna -a2t /home/pchela/malt/nucl_gb.accession2taxid -s DNA -t 20 -d results/MALT_DB/maltDB.dat 2>> logs/B
UILD_MALT_DB/BUILD_MALT_DB.log
Traceback (most recent call last):
  File "/home/pchela/aMeta/.snakemake/scripts/tmprumyhy2t.malt-build.py", line 37, in <module>
    shell(
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/site-packages/snakemake/shell.py", line 231, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  grep -wFf results/KRAKENUNIQ_ABUNDANCE_MATRIX/unique_species_taxid_list.txt /home/pchela/malt/nucl_gb.accession2taxid > results/MALT_DB/seqid2t
axid.project.map; cut -f1 results/MALT_DB/seqid2taxid.project.map > results/MALT_DB/seqids.project; grep -Ff results/MALT_DB/seqids.project /home/pchela/malt/library.fna | sed 's/>//g' > results/MALT_DB/
project.headers; seqtk subseq /home/pchela/malt/library.fna results/MALT_DB/project.headers > results/MALT_DB/library.project.fna  2>> logs/BUILD_MALT_DB/BUILD_MALT_DB.log; unset DISPLAY; malt-build -i r
esults/MALT_DB/library.project.fna -a2t /home/pchela/malt/nucl_gb.accession2taxid -s DNA -t 20 -d results/MALT_DB/maltDB.dat  2>> logs/BUILD_MALT_DB/BUILD_MALT_DB.log' returned non-zero exit status 137.
[Fri Jun 14 14:24:08 2024]
Error in rule Build_Malt_DB:
    jobid: 14
    output: results/MALT_DB/seqid2taxid.project.map, results/MALT_DB/seqids.project, results/MALT_DB/project.headers, results/MALT_DB/library.project.fna, results/MALT_DB/maltDB.dat
    log: logs/BUILD_MALT_DB/BUILD_MALT_DB.log (check log file(s) for error message)
    conda-env: /home/pchela/aMeta/.snakemake/conda/8c061af2ff5608aa242db3b3e339abbe

RuleException:
CalledProcessError in line 26 of /home/pchela/aMeta/workflow/rules/malt.smk:
Command 'source /home/pchela/miniforge3/envs/aMeta/bin/activate '/home/pchela/aMeta/.snakemake/conda/8c061af2ff5608aa242db3b3e339abbe'; set -euo pipefail;  python /home/pchela/aMeta/.snakemake/scripts/tm
prumyhy2t.malt-build.py' returned non-zero exit status 1.
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2274, in run_wrapper
  File "/home/pchela/aMeta/workflow/rules/malt.smk", line 26, in __rule_Build_Malt_DB
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 569, in _callback
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/concurrent/futures/thread.py", line 58, in run
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 555, in cached_or_run
  File "/home/pchela/miniforge3/envs/aMeta/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2362, in run_wrapper
Removing output files of failed job Build_Malt_DB since they might be corrupted:
results/MALT_DB/seqid2taxid.project.map, results/MALT_DB/seqids.project, results/MALT_DB/project.headers, results/MALT_DB/library.project.fna, results/MALT_DB/maltDB.dat

The log in logs/BUILD_MALT_DB/BUILD_MALT_DB.log is:

Version   MALT (version 0.6.2, built 12 Sep 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 20.0.2; max memory: 512G
Classifications to use: Taxonomy
Reference sequence type set to: DNA
Seed shape(s): 111110111011110110111111
Number input files:            1
Loading FastA files:
10% 100% (2,655.7s)
Number of sequences:  11,260,111
Number of letters:94,048,645,023
BUILDING table (0)...
Seeds found: 93,789,662,470
tableSize=    2,147,483,639
hashMask.length=31
maxHitsPerHash set to: 1000
Initializing arrays...
100% (0.0s)
Analysing seeds...
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (2,450.2s)
Number of low-complexity seeds skipped: 3,959,259,542
Allocating hash table...
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (1,171.3s)
Total keys used:     2,144,078,094
Total seeds matched:80,185,235,209
Total seeds dropped: 3,408,936,531
Opening file: results/MALT_DB/maltDB.dat/table0.db
Allocating: 605.4 GB

LeandroRitter commented 3 months ago

@JediKnightChan in your case the MALT database seems to be very big, 605 GB. So your reserved 512 GB java heap space should be increased, please follow this instruction https://github.com/NBISweden/aMeta?tab=readme-ov-file#i-get-java-heap-space-error-on-the-malt-step-what-should-i-do. How many samples are you analyzing? Are they deep-sequencing data? How many organisms have your KrakenUniq detected (can be checked in results/KRAKENUNIQABUNDANCE_MATRIX/krakenuniq_abundance_matrix.txt)? Also, did you by any chance modify the default aMeta filters?

# Breadth and depth of coverage filters
# default thresholds are very conservative, can be tuned by users
n_unique_kmers: 1000
n_tax_reads: 200

If yes, that is ok if you made them more permissive, but you will need to allocate more resources then.

In summary, to proceed with the current size Malt database you need to reserve at least 700 GB of RAM and modify your java heap space setting as described here https://github.com/NBISweden/aMeta?tab=readme-ov-file#i-get-java-heap-space-error-on-the-malt-step-what-should-i-do

JediKnightChan commented 3 months ago

How many samples are you analyzing? Are they deep-sequencing data? How many organisms have your KrakenUniq detected (can be checked in results/KRAKENUNIQABUNDANCE_MATRIX/krakenuniq_abundance_matrix.txt)? Also, did you by any chance modify the default aMeta filters?

I am exploring just 1 sample with not too big coverage, but the sample seems to contain quite a lot of organisms, there are 69 lines in kraken uniq abundance matrix. I didn't modify any default settings though.

In summary, to proceed with the current size Malt database you need to reserve at least 700 GB of RAM

I managed to pass the malt build step after increasing memory to > 1024 GB and setting the malt build and run memory according to the fix in README, eg malt-run looks like this

# Enter one VM parameter per line
# For example, to adjust the maximum memory usage to 512 MB, uncomment the following line:
# -Xmx512m
# To include another file, uncomment the following line:
# -include-options [path to other .vmoption file]
-Xmx1024G

but I got another error that may be related to lack of memory in jobjid 13:

Activating conda environment: /home/pchela/aMeta/.snakemake/conda/8c061af2ff5608aa242db3b3e339abbe
/usr/bin/bash: line 1: 13913 Killed                  malt-run -at SemiGlobal -m BlastN -i results/CUTADAPT_ADAPTER_TRIMMING/rk8849.AF8B8B2EE.merged.trimmed.fastq.gz -o results/MALT/rk8849.AF8B8B2EE.merged.trimmed.rma6 -a results/MALT/rk8849.AF8B8B2EE.merged.trimmed.sam -t 20 -d results/MALT_DB/maltDB.dat -sup 1 -mq 100 -top 1 -mpi 85.0 -id 85.0 -v &> logs/MALT/rk8849.AF8B8B2EE.merged.log
[Sun Jun 16 03:54:22 2024]
Error in rule Malt:
    jobid: 13
    output: results/MALT/rk8849.AF8B8B2EE.merged.trimmed.rma6, results/MALT/rk8849.AF8B8B2EE.merged.trimmed.sam.gz
    log: logs/MALT/rk8849.AF8B8B2EE.merged.log (check log file(s) for error message)
    conda-env: /home/pchela/aMeta/.snakemake/conda/8c061af2ff5608aa242db3b3e339abbe
    shell:
        unset DISPLAY; malt-run -at SemiGlobal -m BlastN -i results/CUTADAPT_ADAPTER_TRIMMING/rk8849.AF8B8B2EE.merged.trimmed.fastq.gz -o results/MALT/rk8849.AF8B8B2EE.merged.trimmed.rma6 -a results/MALT/rk8849.AF8B8B2EE.merged.trimmed.sam -t 20 -d results/MALT_DB/maltDB.dat -sup 1 -mq 100 -top 1 -mpi 85.0 -id 85.0 -v &> logs/MALT/rk8849.AF8B8B2EE.merged.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

The log is

MaltRun - Aligns sequences using MALT (MEGAN alignment tool)
Options:
Mode:
    --mode: BlastN
    --alignmentType: SemiGlobal
Input:
    --inFile: results/CUTADAPT_ADAPTER_TRIMMING/rk8849.AF8B8B2EE.merged.trimmed.fastq.gz
    --index: results/MALT_DB/maltDB.dat
Output:
    --output: results/MALT/rk8849.AF8B8B2EE.merged.trimmed.rma6
    --includeUnaligned: false
    --alignments: results/MALT/rk8849.AF8B8B2EE.merged.trimmed.sam
    --format: SAM
    --gzipAlignments: true
    --samSoftClip: false
    --sparseSAM: false
Performance:
    --numThreads: 20
    --memoryMode: load
    --maxTables: 0
    --replicateQueryCache: false
Filter:
    --minBitScore: 50.0
    --maxExpected: 1.0
    --minPercentIdentity: 85.0
    --maxAlignmentsPerQuery: 100
    --maxAlignmentsPerRef: 1
BlastN parameters:
    --matchScore: 2
    --mismatchScore: -3
    --setLambda: 0.625
    --setK: 0.41
DNA query parameters:
    --forwardOnly: false
    --reverseOnly: false
LCA parameters:
    --topPercent: 1.0
    --minSupportPercent: 0.001
    --minSupport: 1
    (--minSupportPercent: overridden, set to 0)
    --minPercentIdentityLCA: 85.0
    --useMinPercentIdentityFilterLCA: false
    --weightedLCA: false
    --magnitudes: false
Heuristics:
    --maxSeedsPerFrame: 100
    --maxSeedsPerRef: 20
    --seedShift: 1
Banded alignment parameters:
    --gapOpen: 7
    --gapExtend: 3
    --band: 4
Other:
    --replicateQueryCacheBits: 20
    --xPart: false
    --verbose: true
Version   MALT (version 0.6.2, built 12 Sep 2023)
Author(s) Daniel H. Huson
Copyright (C) 2023 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Java version: 20.0.2; max memory: 1024G
--- LOADING ---:
Reading file: results/MALT_DB/maltDB.dat/ref.idx
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (2.1s)
Reading file: results/MALT_DB/maltDB.dat/ref.db
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (1,868.8s)
Number of sequences:    11,260,111
Number of letters:  94,048,645,023
LOADING table (0) ...
Reading file: results/MALT_DB/maltDB.dat/index0.idx
Reference sequence type: DNA
100% (0.0s)
Reading file: results/MALT_DB/maltDB.dat/table0.idx
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (320.3s)
Reading file: results/MALT_DB/maltDB.dat/table0.db
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (12,368.5s)
Table size: 89,842,724,623
Loading ncbi.map: 2,396,736
Loading ncbi.tre: 2,396,740
Reading file: results/MALT_DB/maltDB.dat/taxonomy.idx
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (1.0s)
--- ALIGNING ---:
+++++ Aligning file: results/CUTADAPT_ADAPTER_TRIMMING/rk8849.AF8B8B2EE.merged.trimmed.fastq.gz
Starting file: results/MALT/rk8849.AF8B8B2EE.merged.trimmed.rma6

LeandroRitter commented 3 months ago

@JediKnightChan you seem to have an unusually rich sample. Do not think I ever seen 69 organisms in one sample. Are they microbial or you have eukaryotes as well? I am afraid you have an "out of memory" issue again, are you running the Malt job on a compute node with at least 1TB of RAM? Could you please double-check this? If yes, I am afraid you will have to reserve a node with more RAM and perhaps increase the Java heap memory from -Xmx1024G to e.g. -Xmx2024G. If it is not possible to allocate more RAM, one thing you could try is to modify the default depth and breadth of coverage filters, to make them more conservative in order to decrease the number of detected species, for example

# Breadth and depth of coverage filters, default thresholds are very conservative, can be tuned by users
n_unique_kmers: 1000
n_tax_reads: 500

JediKnightChan commented 3 months ago

# Breadth and depth of coverage filters, default thresholds are very conservative, can be tuned by users
n_unique_kmers: 1000
n_tax_reads: 500

Even with these settings I get more than 50 species in krakenuniq abundance matrix, and unfortunately my vm is limited with 1 TB.

I tried excluding malt from analysis earlier, but it still led to error in Build_Malt_DB, as that stage was still executed. Now the error is with malt-run, will excluding malt allow to avoid it?

LeandroRitter commented 3 months ago

@JediKnightChan you should not exclude MALT from the analysis because all the downstream stats are dependent on MALT, it is the major aligner in aMeta and this step cannot be avoided unfortunately. And again: are the ~50 species all microbial or you do have eukaryotes? Would you mins sending me your config.yaml file to nikolay.oskolkov@scilifelab.se? Also we could arrange a zoom-session where I could try to assist you.

LeandroRitter commented 2 months ago

@JediKnightChan looking at the malt log-file that you posted, I realized that the problem may not be the lack of memory but rather some miss-specification of the number of threads in aMeta and available number of threads at your HPC. The "killed" error message is a cluster related message which (to my experience) typically means that you reserved much fewer than 20 cpus which are used by default in aMeta. In most cases this is not a problem and Malt + HPC can adjust the number of threads, however I remember I saw that some HPCs throw this kind of "killed" error when there are not enough cpus available.

Now, to solve it there are two ways. First, you can try to increase the amount of cpus at your HPC up to 20, if possible. Second, if this is not possible, you can try to go to aMeta/workflow/rules/malt.smk and modify the "threads" field in the "rule Malt". Right now it is set to "threads: 20", and you can replace it by "threads: 1" or any other number of available cpus at your HPC / compute node. With "threads: 1", Malt will be very slow but at least it may work out.

jfy133 commented 2 months ago

Just to chime in: I had the same issue with the remote provides error.

Replacing with snakemake <=6.3.0 allowed me to start the test.~

Would be nice to have a new release with this bug fix incorporated :)