genomicepidemiology / ARGprofiler

A pipeline for for large-scale analysis of antimicrobial resistance genes and their flanking regions in metagenomic datasets
Apache License 2.0
16 stars 3 forks source link

Running ARGprofiler for the first time #4

Open bioinfogini opened 1 month ago

bioinfogini commented 1 month ago

Hello, I am opening an issue as I am trying to install and run ARGprofiler on my laptop for the first time, but i run into an error.

Following your I proceeded guidelines, I did

  1. git clone https://github.com/genomicepidemiology/ARGprofiler.git
  2. snakemake installation
  3. mamba env create --name argprofiler --file ARGprofiler/rules/environment_argprofiler.yaml
  4. mamba activate argprofiler
  5. cd ARGprofiler (I am forced to go into ARGprofiler folder to make it run, else it can't obviously find the config.yaml)
  6. snakemake --profile profile_argprofiler

The issue is configured her:

Error in rule index_db_mOTUs: jobid: 3 input: prerequisites/db_motus/db_mOTU_v3.0.1.tar.gz output: prerequisites/db_motus/check_file_index_db_mOTUs.txt shell:

    tar -xf prerequisites/db_motus/db_mOTU_v3.0.1.tar.gz -C prerequisites/db_motus/ db_mOTU/db_mOTU_DB_CEN.fasta
    /usr/bin/time -v --output=prerequisites/db_motus/index_mOTUs.bench kma index -i prerequisites/db_motus/db_mOTU/db_mOTU_DB_CEN.fasta -o prerequisites/db_motus/db_mOTUs
    touch prerequisites/db_motus/check_file_index_db_mOTUs.txt

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-05-14T171636.380610.snakemake.log WorkflowError: At least one job did not complete successfully.

Log content is the following:

Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count


ARG_extender_paired_reads 1 ARG_extender_single_reads 1 all 1 cleanup_paired_end_reads 1 cleanup_single_end_reads 1 download_paired_end_reads 1 download_single_end_reads 1 fetch_db_mOTUs 1 fetch_db_panres 1 index_db_mOTUs 1 index_db_panres 1 kma_paired_end_reads_mOTUs 1 kma_paired_end_reads_panRes 1 kma_single_end_reads_mOTUs 1 kma_single_end_reads_panRes 1 mash_sketch_paired_end_reads 1 mash_sketch_single_end_reads 1 trim_paired_end_reads 1 trim_single_end_reads 1 total 19

Select jobs to execute... Execute 4 jobs...

[Tue May 14 17:16:36 2024] localrule download_single_end_reads: output: results/raw_reads/single_end/ERR262503/ERR262503.fastq.gz, results/raw_reads/single_end/ERR262503/ERR262503_check_file_raw.txt jobid: 13 reason: Missing output files: results/raw_reads/single_end/ERR262503/ERR262503.fastq.gz, results/raw_reads/single_end/ERR262503/ERR262503_check_file_raw.txt wildcards: single_reads=ERR262503 resources: tmpdir=/tmp

[Tue May 14 17:16:36 2024] localrule fetch_db_mOTUs: output: prerequisites/db_motus/db_mOTU_v3.0.1.tar.gz jobid: 4 reason: Missing output files: prerequisites/db_motus/db_mOTU_v3.0.1.tar.gz resources: tmpdir=/tmp

[Tue May 14 17:16:36 2024] localrule download_paired_end_reads: output: results/raw_reads/paired_end/SRR7125621/SRR7125621_1.fastq.gz, results/raw_reads/paired_end/SRR7125621/SRR7125621_2.fastq.gz, results/raw_reads/paired_end/SRR7125621/SRR7125621_check_file_raw.txt jobid: 6 reason: Missing output files: results/raw_reads/paired_end/SRR7125621/SRR7125621_1.fastq.gz, results/raw_reads/paired_end/SRR7125621/SRR7125621_check_file_raw.txt, results/raw_reads/paired_end/SRR7125621/SRR7125621_2.fastq.gz wildcards: paired_reads=SRR7125621 resources: tmpdir=/tmp

[Tue May 14 17:16:36 2024] localrule fetch_db_panres: output: prerequisites/db_panres/panres_genes.fa jobid: 2 reason: Missing output files: prerequisites/db_panres/panres_genes.fa resources: tmpdir=/tmp

[Tue May 14 17:16:38 2024] Finished job 13. 1 of 19 steps (5%) done Select jobs to execute... Execute 1 jobs...

[Tue May 14 17:16:38 2024] localrule trim_single_end_reads: input: results/raw_reads/single_end/ERR262503/ERR262503.fastq.gz output: results/trimmed_reads/single_end/ERR262503/ERR262503.trimmed.fastq, results/trimmed_reads/single_end/ERR262503/ERR262503_check_file_trim.txt jobid: 14 reason: Missing output files: results/trimmed_reads/single_end/ERR262503/ERR262503_check_file_trim.txt, results/trimmed_reads/single_end/ERR262503/ERR262503.trimmed.fastq; Input files updated by another job: results/raw_reads/single_end/ERR262503/ERR262503.fastq.gz wildcards: single_reads=ERR262503 resources: tmpdir=/tmp

[Tue May 14 17:16:39 2024] Finished job 14. 2 of 19 steps (11%) done Select jobs to execute... Execute 1 jobs...

[Tue May 14 17:16:39 2024] localrule mash_sketch_single_end_reads: input: results/trimmed_reads/single_end/ERR262503/ERR262503.trimmed.fastq output: results/mash_sketch/single_end/ERR262503/ERR262503.trimmed.fastq.msh, results/mash_sketch/single_end/ERR262503/ERR262503_check_file_mash.txt jobid: 17 reason: Missing output files: results/mash_sketch/single_end/ERR262503/ERR262503_check_file_mash.txt; Input files updated by another job: results/trimmed_reads/single_end/ERR262503/ERR262503.trimmed.fastq wildcards: single_reads=ERR262503 resources: tmpdir=/tmp

[Tue May 14 17:16:39 2024] Finished job 17. 3 of 19 steps (16%) done [Tue May 14 17:16:40 2024] Finished job 2. 4 of 19 steps (21%) done Select jobs to execute... Execute 1 jobs...

[Tue May 14 17:16:40 2024] localrule index_db_panres: input: prerequisites/db_panres/panres_genes.fa output: prerequisites/db_panres/check_file_index_db_panres.txt jobid: 1 reason: Missing output files: prerequisites/db_panres/check_file_index_db_panres.txt; Input files updated by another job: prerequisites/db_panres/panres_genes.fa resources: tmpdir=/tmp

[Tue May 14 17:16:41 2024] Finished job 6. 5 of 19 steps (26%) done Select jobs to execute... Execute 1 jobs...

[Tue May 14 17:16:41 2024] localrule trim_paired_end_reads: input: results/raw_reads/paired_end/SRR7125621/SRR7125621_1.fastq.gz, results/raw_reads/paired_end/SRR7125621/SRR7125621_2.fastq.gz output: results/trimmed_reads/paired_end/SRR7125621/SRR7125621_1.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_2.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_singleton.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_check_file_trim.txt jobid: 7 reason: Missing output files: results/trimmed_reads/paired_end/SRR7125621/SRR7125621_1.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_check_file_trim.txt, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_singleton.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_2.trimmed.fastq; Input files updated by another job: results/raw_reads/paired_end/SRR7125621/SRR7125621_1.fastq.gz, results/raw_reads/paired_end/SRR7125621/SRR7125621_2.fastq.gz wildcards: paired_reads=SRR7125621 resources: tmpdir=/tmp

[Tue May 14 17:16:44 2024] Finished job 7. 6 of 19 steps (32%) done Select jobs to execute... Execute 1 jobs...

[Tue May 14 17:16:44 2024] localrule mash_sketch_paired_end_reads: input: results/trimmed_reads/paired_end/SRR7125621/SRR7125621_1.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_2.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_singleton.trimmed.fastq output: results/mash_sketch/paired_end/SRR7125621/SRR7125621.trimmed.fastq.msh, results/mash_sketch/paired_end/SRR7125621/SRR7125621_check_file_mash.txt jobid: 10 reason: Missing output files: results/mash_sketch/paired_end/SRR7125621/SRR7125621_check_file_mash.txt; Input files updated by another job: results/trimmed_reads/paired_end/SRR7125621/SRR7125621_1.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_singleton.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_2.trimmed.fastq wildcards: paired_reads=SRR7125621 resources: tmpdir=/tmp

[Tue May 14 17:16:44 2024] Finished job 10. 7 of 19 steps (37%) done [Tue May 14 17:16:48 2024] Finished job 1. 8 of 19 steps (42%) done Select jobs to execute... Execute 2 jobs...

[Tue May 14 17:16:48 2024] localrule kma_paired_end_reads_panRes: input: results/trimmed_reads/paired_end/SRR7125621/SRR7125621_1.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_2.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_singleton.trimmed.fastq, prerequisites/db_panres/check_file_index_db_panres.txt output: results/kma_panres/paired_end/SRR7125621/SRR7125621.res, results/kma_panres/paired_end/SRR7125621/SRR7125621.mat.gz, results/kma_panres/paired_end/SRR7125621/SRR7125621.mapstat, results/kma_panres/paired_end/SRR7125621/SRR7125621.bam, results/kma_panres/paired_end/SRR7125621/SRR7125621.mapstat.filtered, results/kma_panres/paired_end/SRR7125621/SRR7125621_check_file_kma.txt jobid: 9 reason: Missing output files: results/kma_panres/paired_end/SRR7125621/SRR7125621_check_file_kma.txt, results/kma_panres/paired_end/SRR7125621/SRR7125621.mapstat.filtered; Input files updated by another job: results/trimmed_reads/paired_end/SRR7125621/SRR7125621_singleton.trimmed.fastq, prerequisites/db_panres/check_file_index_db_panres.txt, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_1.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_2.trimmed.fastq wildcards: paired_reads=SRR7125621 resources: tmpdir=/tmp

[Tue May 14 17:16:48 2024] localrule kma_single_end_reads_panRes: input: results/trimmed_reads/single_end/ERR262503/ERR262503.trimmed.fastq, prerequisites/db_panres/check_file_index_db_panres.txt output: results/kma_panres/single_end/ERR262503/ERR262503.res, results/kma_panres/single_end/ERR262503/ERR262503.mat.gz, results/kma_panres/single_end/ERR262503/ERR262503.mapstat, results/kma_panres/single_end/ERR262503/ERR262503.bam, results/kma_panres/single_end/ERR262503/ERR262503.mapstat.filtered, results/kma_panres/single_end/ERR262503/ERR262503_check_file_kma.txt jobid: 16 reason: Missing output files: results/kma_panres/single_end/ERR262503/ERR262503_check_file_kma.txt, results/kma_panres/single_end/ERR262503/ERR262503.mapstat.filtered; Input files updated by another job: results/trimmed_reads/single_end/ERR262503/ERR262503.trimmed.fastq, prerequisites/db_panres/check_file_index_db_panres.txt wildcards: single_reads=ERR262503 resources: tmpdir=/tmp

[Tue May 14 17:16:49 2024] Finished job 16. 9 of 19 steps (47%) done Select jobs to execute... Execute 1 jobs...

[Tue May 14 17:16:49 2024] localrule ARG_extender_single_reads: input: results/trimmed_reads/single_end/ERR262503/ERR262503.trimmed.fastq, results/kma_panres/single_end/ERR262503/ERR262503.mapstat.filtered output: results/ARG_extender/single_end/ERR262503/ERR262503.fasta.gz, results/ARG_extender/single_end/ERR262503/ERR262503.gfa.gz, results/ARG_extender/single_end/ERR262503/ERR262503.frag.gz, results/ARG_extender/single_end/ERR262503/ERR262503.frag_raw.gz, results/ARG_extender/single_end/ERR262503/ERR262503_check_file_ARG.txt jobid: 18 reason: Missing output files: results/ARG_extender/single_end/ERR262503/ERR262503_check_file_ARG.txt; Input files updated by another job: results/trimmed_reads/single_end/ERR262503/ERR262503.trimmed.fastq, results/kma_panres/single_end/ERR262503/ERR262503.mapstat.filtered wildcards: single_reads=ERR262503 resources: tmpdir=/tmp

[Tue May 14 17:16:49 2024] Finished job 18. 10 of 19 steps (53%) done [Tue May 14 17:16:50 2024] Finished job 9. 11 of 19 steps (58%) done Select jobs to execute... Execute 1 jobs...

[Tue May 14 17:16:50 2024] localrule ARG_extender_paired_reads: input: results/trimmed_reads/paired_end/SRR7125621/SRR7125621_1.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_2.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_singleton.trimmed.fastq, results/kma_panres/paired_end/SRR7125621/SRR7125621.mapstat.filtered output: results/ARG_extender/paired_end/SRR7125621/SRR7125621.fasta.gz, results/ARG_extender/paired_end/SRR7125621/SRR7125621.gfa.gz, results/ARG_extender/paired_end/SRR7125621/SRR7125621.frag.gz, results/ARG_extender/paired_end/SRR7125621/SRR7125621.frag_raw.gz, results/ARG_extender/paired_end/SRR7125621/SRR7125621_check_file_ARG.txt jobid: 11 reason: Missing output files: results/ARG_extender/paired_end/SRR7125621/SRR7125621_check_file_ARG.txt; Input files updated by another job: results/trimmed_reads/paired_end/SRR7125621/SRR7125621_singleton.trimmed.fastq, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_1.trimmed.fastq, results/kma_panres/paired_end/SRR7125621/SRR7125621.mapstat.filtered, results/trimmed_reads/paired_end/SRR7125621/SRR7125621_2.trimmed.fastq wildcards: paired_reads=SRR7125621 resources: tmpdir=/tmp

[Tue May 14 17:16:50 2024] Finished job 11. 12 of 19 steps (63%) done [Tue May 14 17:22:25 2024] Finished job 4. 13 of 19 steps (68%) done Select jobs to execute... Execute 1 jobs...

[Tue May 14 17:22:25 2024] localrule index_db_mOTUs: input: prerequisites/db_motus/db_mOTU_v3.0.1.tar.gz output: prerequisites/db_motus/check_file_index_db_mOTUs.txt jobid: 3 reason: Missing output files: prerequisites/db_motus/check_file_index_db_mOTUs.txt; Input files updated by another job: prerequisites/db_motus/db_mOTU_v3.0.1.tar.gz resources: tmpdir=/tmp

[Tue May 14 17:25:31 2024] Error in rule index_db_mOTUs: jobid: 3 input: prerequisites/db_motus/db_mOTU_v3.0.1.tar.gz output: prerequisites/db_motus/check_file_index_db_mOTUs.txt shell:

    tar -xf prerequisites/db_motus/db_mOTU_v3.0.1.tar.gz -C prerequisites/db_motus/ db_mOTU/db_mOTU_DB_CEN.fasta
    /usr/bin/time -v --output=prerequisites/db_motus/index_mOTUs.bench kma index -i prerequisites/db_motus/db_mOTU/db_mOTU_DB_CEN.fasta -o prerequisites/db_motus/db_mOTUs
    touch prerequisites/db_motus/check_file_index_db_mOTUs.txt

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-05-14T171636.380610.snakemake.log WorkflowError: At least one job did not complete successfully.

Thanks in advance for your help!

blackadder901 commented 1 month ago

Hello! Sorry for the late response.

I cannot really understand what is the issue but from what you say, you clonded the repo and then installed snakemake? You don't need to install snakemake, it is provided in the environment file.

That might be a possible issue on why things are getting messy.

If you have mamba or micromamba installed, then you need to do the following: 1.mamba env create --name argprofiler --file rules/environment_argprofiler.yaml 2.mamba activate argprofiler 3.snakemake --profile profile_argprofiler

bioinfogini commented 1 month ago

Hello @blackadder901 , thank for your feedback! It was not clear for me that snakemake was included, thank for the correction.

In respect to your guidelines however, I had to run snakemake --profile profile_argprofiler command from inside the ARGprofiler folder. Does it seems correct to you? Cause I then bumped into another error, that Error in rule index_db_mOTUs: jobid: 3 input: prerequisites/db_motus/db_mOTU_v3.0.1.tar.gz output: prerequisites/db_motus/check_file_index_db_mOTUs.txt shell:

    tar -xf prerequisites/db_motus/db_mOTU_v3.0.1.tar.gz -C prerequisites/db_motus/ db_mOTU/db_mOTU_DB_CEN.fasta
    /usr/bin/time -v --output=prerequisites/db_motus/index_mOTUs.bench kma index -i prerequisites/db_motus/db_mOTU/db_mOTU_DB_CEN.fasta -o prerequisites/db_motus/db_mOTUs
    touch prerequisites/db_motus/check_file_index_db_mOTUs.txt

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-05-23T164814.308111.snakemake.log WorkflowError: At least one job did not complete successfully.

Thanks in advance!

Chiara

hmmartiny commented 1 month ago

Hi @bioinfogini ,

Can you have a look in your subdirectories to verify that the files are where they should be?

Please tell us the location of the following files:

I suspect that the issue you are experiencing is likely due to wrong location of one of the files

bioinfogini commented 1 month ago

Hi @hmmartiny! As I launch the snakemake --profile profile_argprofiler command already inside the directory I created (~/ARGprofiler), db_mOTU_v3.0.1.tar.gz is found in ~/ARGprofiler/prerequisites/db_motus/ls prerequisites/db_motus/ , while db_mOTU_DB_CEN.fasta can be found in ~/ARGprofiler/prerequisites/db_motus/db_mOTU/db_mOTU_DB_CEN.fasta

Hope it helps, Thanks in advance for your support!

bioinfogini commented 1 month ago

Hello there! Just a quick update: I tried to install ARGprofiler on another laptop, and worked smootly. On the first laptop however, still same error. Waiting for your guidance. Thank you!

hmmartiny commented 1 month ago

If it worked on another laptop, I suggest you try to force snakemake to rerun the download, extracting, and indexing mOTUs again. That you can do with the following command:

snakemake --profile profile_argprofiler -R fetch_db_mOTUs

With this command, you will redownload the tarball from zenodo and also rerun the indexing with kma.

Unfortunately, snakemake is not the best at reporting errors, so its hard to see what fails. If it continues to fail, try to run each command from the command line by just copy-and-paste.

bioinfogini commented 1 month ago

Unfortunately, it did not worked again. Could you please explain what do you mean by " If it continues to fail, try to run each command from the command line by just copy-and-paste."?

I am actually on my command line, doing copy, paste and run the following:

  1. git clone https://github.com/genomicepidemiology/ARGprofiler.git
  2. cd ARGprofiler/
  3. mamba env create --name argprofiler --file rules/environment_argprofiler.yaml
  4. mamba activate argprofiler
  5. snakemake --profile profile_argprofiler -R fetch_db_mOTUs