PacificBiosciences / HiFi-16S-workflow

Nextflow pipeline to analyze PacBio HiFi full-length 16S data
BSD 3-Clause Clear License
59 stars 15 forks source link

Exit code 1 with Process `pb16S:qiime2_phylogeny_diversity (1)` #53

Closed ewissel closed 4 months ago

ewissel commented 4 months ago

Hey again,

I was able to run this pipeline successfully with test and real data, but am running into an issue I can't solve with a new set of real data. Here is my command:

nextflow run main.nf -c /ceph/work/IBMS-PHLab/emily/scripts_general/second_ed_next.config  \
    -resume  \
    --input $input_fq \
    --metadata $input_metadat -profile singularity \
    --outdir $output_dir \
    --resume --dada2_cpu 12 --vsearch_cpu 12 --cutadapt_cpu 12 \
    --vsearch_db /ceph/work/IBMS-PHLab/emily/databases/silva-138-99-seqs.qza \
    --vsearch_tax /ceph/work/IBMS-PHLab/emily/databases/silva-138-99-tax.qza \
    --gtdb_db /ceph/work/IBMS-PHLab/emily/databases/GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz \
    --refseq_db /ceph/work/IBMS-PHLab/emily/databases/RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz \
    --silva_db  /ceph/work/IBMS-PHLab/emily/databases/silva_nr99_v138.1_wSpecies_train_set.fa.gz \
    --run_picrust2

And the error:

executor >  slurm (2)
[2b/4cd17e] process > pb16S:write_log                [100%] 1 of 1, cached: 1 ✔
[39/c8edf1] process > pb16S:QC_fastq (7)             [100%] 8 of 8, cached: 8 ✔
[13/8b552e] process > pb16S:cutadapt (3)             [100%] 8 of 8, cached: 8 ✔
[05/9e6746] process > pb16S:QC_fastq_post_trim (6)   [100%] 8 of 8, cached: 8 ✔
[f6/7194f9] process > pb16S:collect_QC               [100%] 1 of 1, cached: 1 ✔
[b9/59ec22] process > pb16S:prepare_qiime2_manife... [100%] 1 of 1, cached: 1 ✔
[96/93e9f4] process > pb16S:merge_sample_manifest    [100%] 1 of 1, cached: 1 ✔
[1a/5488cf] process > pb16S:import_qiime2 (1)        [100%] 1 of 1, cached: 1 ✔
[0e/462a3b] process > pb16S:demux_summarize (1)      [100%] 1 of 1, cached: 1 ✔
[4b/d19d23] process > pb16S:dada2_denoise (1)        [100%] 1 of 1, cached: 1 ✔
[b8/e09ffb] process > pb16S:mergeASV                 [100%] 1 of 1, cached: 1 ✔
[11/083c3d] process > pb16S:filter_dada2             [100%] 1 of 1, cached: 1 ✔
[73/4c1b86] process > pb16S:dada2_qc (1)             [100%] 1 of 1, cached: 1 ✔
[70/eb4d46] process > pb16S:qiime2_phylogeny_dive... [100%] 1 of 1, failed: 1 ✘
[a2/c875d3] process > pb16S:dada2_rarefaction (1)    [100%] 1 of 1, cached: 1 ✔
[d6/789daf] process > pb16S:class_tax                [100%] 1 of 1, cached: 1 ✔
[-        ] process > pb16S:dada2_assignTax          -
[-        ] process > pb16S:export_biom              -
[-        ] process > pb16S:barplot_nb               -
[5c/e71407] process > pb16S:barplot (1)              [100%] 1 of 1, cached: 1 ✔
[-        ] process > pb16S:picrust2                 -
[-        ] process > pb16S:html_rep                 -
[90/edae73] process > pb16S:krona_plot               [100%] 1 of 1, cached: 1 ✔
Error executing process > 'pb16S:qiime2_phylogeny_diversity (1)'

Caused by:
  Process `pb16S:qiime2_phylogeny_diversity (1)` terminated with an error exit status (1)

Command executed:

  qiime phylogeny align-to-tree-mafft-fasttree     --p-n-threads 1     --i-sequences dada2-ccs_rep_filtered.qza     --o-alignment mafft_alignment.qza     --o-masked-alignment mafft_alignment_masked.qza     --o-tree phylotree_mafft_unrooted.qza     --o-rooted-tree phylotree_mafft_rooted.qza

  qiime tools export --input-path phylotree_mafft_rooted.qza     --output-path ./
  mv tree.nwk phylotree_mafft_rooted.nwk

  qiime diversity core-metrics-phylogenetic     --p-n-jobs-or-threads 1     --i-phylogeny phylotree_mafft_rooted.qza     --i-table dada2-ccs_table_filtered.qza     --m-metadata-file metadat.tsv     --p-sampling-depth 12954     --output-dir ./core-metrics-diversity

  # Export various matrix for plotting later
  qiime tools export --input-path ./core-metrics-diversity/bray_curtis_distance_matrix.qza     --output-path ./core-metrics-diversity
  mv ./core-metrics-diversity/distance-matrix.tsv     ./core-metrics-diversity/bray_curtis_distance_matrix.tsv
  qiime tools export --input-path ./core-metrics-diversity/weighted_unifrac_distance_matrix.qza     --output-path ./core-metrics-diversity
  mv ./core-metrics-diversity/distance-matrix.tsv     ./core-metrics-diversity/weighted_unifrac_distance_matrix.tsv
  qiime tools export --input-path ./core-metrics-diversity/unweighted_unifrac_distance_matrix.qza     --output-path ./core-metrics-diversity
  mv ./core-metrics-diversity/distance-matrix.tsv     ./core-metrics-diversity/unweighted_unifrac_distance_matrix.tsv

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: DEPRECATED USAGE: Environment variable SINGULARITYENV_TMPDIR will not be supported in the future, use APPTAINERENV_TMPDIR instead
  WARNING: DEPRECATED USAGE: Environment variable SINGULARITYENV_NXF_DEBUG will not be supported in the future, use APPTAINERENV_NXF_DEBUG instead
  Plugin error from phylogeny:

    Command '['mafft', '--preservecase', '--inputorder', '--thread', '1', '/tmp/qiime2/ewissel/data/6ed73d42-375c-465f-9210-1ac6b45a25ba/data/dna-sequences.fasta']' returned non-zero exit status 1.

  Debug info has been saved to /tmp/qiime2-q2cli-err-ob_0x3ap.log

Work dir:
  /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/70/eb4d4662d9a652c72b543235dc0088

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

I believe this is an issue related to the location of some files, so here is my config file as well where I specify cache dirs (an updated version of nextflow.config on this repo with server specific info involved):

//Profile config names for nf-core/configs
env {
    TMPDIR = '/ceph/work/IBMS-PHLab/emily/tmp'
}

executor {

    $slurm {
        queue = 'amd'
        queueSize = 200
        pollInterval = '2 min'
        queueStatInterval = '5 min'
        submitRateLimit = '6/1min'
        retry.maxAttempts = 2
    }
}

process {
            executor = 'slurm'
            maxRetries = 1
            queue    = { task.memory <= 250.GB ? (task.time <= 3.h ? 'amd_short' : 'amd') : 'large' }
            beforeScript = 'module load anaconda3/4.12.0'
}
## i know this is redundant w the above executor
params {
    max_memory = 750.GB
    max_cpus = 200
    max_time = 5.d
}

I don't get any additional info in st.err, which is empty, or .nextflow.log, which has the same content as st.out listed above.

ewissel commented 4 months ago

I tried following the solution post here by editing my config file, but still get the same error message on my run. https://github.com/PacificBiosciences/HiFi-16S-workflow/issues/44#issuecomment-1865979941

proteinosome commented 4 months ago

Hi @ewissel , while the error could be due to tmp space similar to #44 , let's confirm that. The error is from the MAFFT command from QIIME 2, which is used for multiple sequences alignment to construct phylogeny tree.

Since you are using Conda profile, can you activate the QIIME 2 environment? By default the environment is stored in $HOME/nf_conda and looks something like

qiime2-2023.2-py38-linux-conda-XXXX (XXX can be different depending on version)

You can activate the environment by doing:

conda activate $HOME/nf_conda/qiime2-2023.2-py38-linux-conda-XXXX (Replace XXXX with what's on your HPC).

Then, go into the workdir /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/70/eb4d4662d9a652c72b543235dc0088 and type:

bash .command.sh

This should produce an error that says something like Debug info has been saved to /tmp/qiime2-q2cli-err-ob_0x3ap.log

Find the log in the /tmp folder and share it here so I can take a look. Thanks.

ewissel commented 4 months ago

hey @proteinosome thanks for the help! You may have found the issue. In /ceph/work/IBMS-PHLab/emily/nf_conda/ (changed the conda cache to make sure there was no issue about the cache being on a diff node than the compute env) I only see the following:

pb-16S-pbtools-79aaa66f28337ea289e1eec2f1ea2739  singularity

However, in my project directory I see PROJDIR/env/ contains the following files:

pb-16s-pbtools.yml  pb-16s-vis-conda.yml  qiime2-2022.2-py38-linux-conda.yml  qiime2-2023.2-py38-linux-conda.yml

For your second suggestion, this confirms the issue w the conda qiime build:

$ cd /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/70/eb4d4662d9a652c72b543235dc0088
$ bash .command.run
.command.sh: line 2: qiime: command not found
.command.sh: line 4: qiime: command not found
mv: cannot stat 'tree.nwk': No such file or directory
.command.sh: line 7: qiime: command not found
.command.sh: line 10: qiime: command not found
mv: cannot stat './core-metrics-diversity/distance-matrix.tsv': No such file or directory
.command.sh: line 12: qiime: command not found
mv: cannot stat './core-metrics-diversity/distance-matrix.tsv': No such file or directory
.command.sh: line 14: qiime: command not found
mv: cannot stat './core-metrics-diversity/distance-matrix.tsv': No such file or directory

Should I manually build the conda install of qiime from the 2023 yaml file in the env/ folder?

proteinosome commented 4 months ago

@ewissel a couple of other steps should be using the same QIIME container, I don't understand how those steps can pass. E.g. pb16S:dada2_denoise needs the QIIME environment, too.

I just took a look again and it looks like you're using the singularity profile. The singularity containers should be in /ceph/work/IBMS-PHLab/emily/nf_conda/singularity.

Can you try to start an interactive shell with the container, and run the command. This should work:

cd /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/70/eb4d4662d9a652c72b543235dc0088

singularity shell -e --bind /ceph/sharedfs/work/IBMS-PHLab/emily/ /ceph/work/IBMS-PHLab/emily/nf_conda/singularity/PB16SIMG

bash .command.sh

Replace PB16SIMG with the correct image in your singularity folder.

ewissel commented 4 months ago

in /ceph/work/IBMS-PHLab/emily/nf_conda/singularity/ there are two files:

kpinpb-pb-16s-nf-qiime-v0.7.img  kpinpb-pb-16s-nf-tools-latest.img

I ran the following commands from your reply with both files:

$ cd /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/70/eb4d4662d9a652c72b543235dc0088
$ singularity shell -e --bind ceph/sharedfs/work/IBMS-PHLab/emily/ /ceph/work/IBMS-PHLab/emily/nf_conda/singularity/kpinpb-pb-16s-nf-tools-latest.img

Apptainer>  ## opens interactive apptainer instance
Apptainer> $ bash .command.sh

With the same qiime error as the st.out from before:

.command.sh: line 2: qiime: command not found
.command.sh: line 4: qiime: command not found
mv: cannot stat 'tree.nwk': No such file or directory
.command.sh: line 7: qiime: command not found
.command.sh: line 10: qiime: command not found
mv: cannot stat './core-metrics-diversity/distance-matrix.tsv': No such file or directory
.command.sh: line 12: qiime: command not found
mv: cannot stat './core-metrics-diversity/distance-matrix.tsv': No such file or directory
.command.sh: line 14: qiime: command not found
mv: cannot stat './core-metrics-diversity/distance-matrix.tsv': No such file or directory

With the other qiime img file. it runs seemingly successfully:

$ singularity shell -e --bind /ceph/sharedfs/work/IBMS-PHLab/emily/ /ceph/work/IBMS-PHLab/emily/nf_conda/singularity/kpinpb-pb-16s-nf-qiime-v0.7.img
Apptainer> $ bash .command.sh

Saved FeatureData[AlignedSequence] to: mafft_alignment.qza
Saved FeatureData[AlignedSequence] to: mafft_alignment_masked.qza
Saved Phylogeny[Unrooted] to: phylotree_mafft_unrooted.qza
Saved Phylogeny[Rooted] to: phylotree_mafft_rooted.qza
Exported phylotree_mafft_rooted.qza as NewickDirectoryFormat to directory ./
Saved FeatureTable[Frequency] to: ./core-metrics-diversity/rarefied_table.qza
Saved SampleData[AlphaDiversity] to: ./core-metrics-diversity/faith_pd_vector.qza
Saved SampleData[AlphaDiversity] to: ./core-metrics-diversity/observed_features_vector.qza
Saved SampleData[AlphaDiversity] to: ./core-metrics-diversity/shannon_vector.qza
Saved SampleData[AlphaDiversity] to: ./core-metrics-diversity/evenness_vector.qza
Saved DistanceMatrix to: ./core-metrics-diversity/unweighted_unifrac_distance_matrix.qza
Saved DistanceMatrix to: ./core-metrics-diversity/weighted_unifrac_distance_matrix.qza
Saved DistanceMatrix to: ./core-metrics-diversity/jaccard_distance_matrix.qza
Saved DistanceMatrix to: ./core-metrics-diversity/bray_curtis_distance_matrix.qza
Saved PCoAResults to: ./core-metrics-diversity/unweighted_unifrac_pcoa_results.qza
Saved PCoAResults to: ./core-metrics-diversity/weighted_unifrac_pcoa_results.qza
Saved PCoAResults to: ./core-metrics-diversity/jaccard_pcoa_results.qza
Saved PCoAResults to: ./core-metrics-diversity/bray_curtis_pcoa_results.qza
Saved Visualization to: ./core-metrics-diversity/unweighted_unifrac_emperor.qzv
Saved Visualization to: ./core-metrics-diversity/weighted_unifrac_emperor.qzv
Saved Visualization to: ./core-metrics-diversity/jaccard_emperor.qzv
Saved Visualization to: ./core-metrics-diversity/bray_curtis_emperor.qzv
Exported ./core-metrics-diversity/bray_curtis_distance_matrix.qza as DistanceMatrixDirectoryFormat to directory ./core-metrics-diversity
Exported ./core-metrics-diversity/weighted_unifrac_distance_matrix.qza as DistanceMatrixDirectoryFormat to directory ./core-metrics-diversity
Exported ./core-metrics-diversity/unweighted_unifrac_distance_matrix.qza as DistanceMatrixDirectoryFormat to directory ./core-metrics-diversity

This one is a bit out of my depth, so I am not sure how to merge these img files / singularity containers, or it that is the best path forward.

proteinosome commented 4 months ago

The QIIME img is the correct one to use and the command succeeding meant that there's nothing wrong with the data. There's no need to merge img, Nextflow will know which one to use.

This suggests that it's highly likely that it's related to #44. You said you've tried editing your config file, can you show me the full edited config file?

Thanks.

ewissel commented 4 months ago

Yes! Thank you for all the help. Here is the full config. I've tried making some adjustments to this as I've been troubleshooting, so this is the version at the time I opened the issue:

//Profile config names for nf-core/configs

env {
    TMPDIR = '/ceph/work/IBMS-PHLab/emily/tmp'
}

executor {

    $slurm {
        queue = 'amd'
        queueSize = 200
        pollInterval = '2 min'
        queueStatInterval = '5 min'
        submitRateLimit = '6/1min'
        retry.maxAttempts = 2
    }
}

process {
            executor = 'slurm'
            maxRetries = 1
            queue    = { task.memory <= 250.GB ? (task.time <= 3.h ? 'amd_short' : 'amd') : 'large' }
            beforeScript = 'module load anaconda3/4.12.0'
}

singularity {
    enabled = true
    cacheDir = '/ceph/work/IBMS-PHLab/emily/singularity-images/'
    autoMounts = true
    enable_container=true
    docker.enabled = false
    podman.enabled = false
    shifter.enabled = false
    charliecloud.enabled = false
}

params {
    max_memory = 750.GB
    max_cpus = 200
    max_time = 5.d
}
proteinosome commented 4 months ago

You're welcome. I don't see any edit there that could help with increasing memory for the diversity step. Did you follow the suggestion in #44 that uses an extra config file?

process {
  // more RAM for the diversity job
  withName: qiime2_phylogeny_diversity {
    cpus = 8
    memory = 240.GB
  }
  // more RAM for the report building
  withName: html_rep {
    cpus = 8
    memory = 128.GB
  }
}
ewissel commented 4 months ago

I tried that and I also tried going into main.nf and changing the cpu argument from 8 to 32, and both resulted in the same error as initially described in the post.

I guess something weird is going on with my install, so I'll try a fresh install again. I'm not too hopeful this will fix the issue because I was able to run some data last week with this nf module without issue, but I'll try nonetheless.

proteinosome commented 4 months ago

Did you specify the extra config file with -c extra.config when you run the command? How much memory did you allocate?

Do not increase the number of CPU, that will only make the step uses more memory, and you are already likely running out of memory.

I also do not think there's anything wrong with the installation, as you were able to manually complete the step following my suggestion above.

ewissel commented 4 months ago

I ammended my initial config file with the content in #44 and kept my nextflow command the same at nextflow run main.nf -c ammended.config . . .

OK so no going in and breaking into the code that shouldn't be touched. Then something weird is going on with the config file? Should I do something other than ammend my initial config file?

proteinosome commented 4 months ago

Can you show me the full ammended.config? The way Nextflow works is that it'll first look for parameters in the default nextflow.config, then override/add any parameters defined in the config file you supply via -c.

ewissel commented 4 months ago

fyi i'm still working on this, I made some edits to the config file while we were troubleshooting so I am rerunning to confirm the version I share produces the error discussed above (of course I'm now stuck somewhere else in the pipeline but we will get there)

ewissel commented 4 months ago

OK I was able to recreate this issue. Here is second_ed_next.config :

//Profile config names for nf-core/configs
params {
    config_profile_description = 'Emily\'s Nextflow Config File for Academia Sinica Compute Grid'
    config_profile_contact = 'Emily Wissel (@ewissel)'
}

env {
    TMPDIR = '/ceph/work/IBMS-PHLab/emily/tmp'
}

executor {

    $slurm {
        queue = 'amd'
        queueSize = 200
        pollInterval = '2 min'
        queueStatInterval = '5 min'
        submitRateLimit = '6/1min'
        retry.maxAttempts = 2
    }
    conda {
        useMamba = false
        conda.enabled = true
        // Allow longer conda creation timeout
        createTimeout = '2 h'
        cacheDir = "/ceph/work/IBMS-PHLab/emily/nf_conda/"
      }

}

process {
            executor = 'slurm'
            maxRetries = 1
            queue    = { task.memory <= 250.GB ? (task.time <= 4.h ? 'amd_short' : 'amd') : 'amd' }
            beforeScript = 'module load anaconda3/4.12.0'
}

params {
    max_memory = 750.GB
    max_cpus = 200
}

// correct bug in path for reports // Generate report
report {
  enabled = true
  overwrite = true
  file = "$params.outdir/report/report.html"
  }

// Timeline
timeline {
  enabled = true
  overwrite = true
  file = "$params.outdir/report/timeline.html"
}

// DAG
dag {
  enabled = true
  file = "$params.outdir/report/dag.html"
  overwrite = true
}

My script for launching this nextflow module is:

### nextflow HiFi run
#SBATCH --job-name=nf_hifi_workflow   # shows up in the output of 'squeue'
#SBATCH --time=12:00:00       # specify the requested wall-time
#SBATCH --nodes=1              # -n number of nodes allocated for this job
#SBATCH --cpus-per-task=1       # -c total number of cores for nextflow job
#SBATCH --error=slurm.hifi.%J.err      # job error. By default, both files are directed to a file of the name slurm-%j.err
#SBATCH --output=slurm.hifi.%J.out     # job output. By default, both files are directed to a file of the name slurm-%j.ou
#SBATCH --partition=amd

## expecting user to sbatch run_hifi_nf.sh -i <FASTQS> -m <METADATA> -o <OUTPUT_DIR>
######################
### Run the Command ##
######################
nextflow run main.nf -c /ceph/work/IBMS-PHLab/emily/scripts_general/second_ed_next.config \
    -resume \
    --input $input_fq \
    --metadata $input_metadat -profile singularity \
    --outdir $output_dir \
    --resume --dada2_cpu 12 --vsearch_cpu 12 --cutadapt_cpu 12 \
    --vsearch_db /ceph/work/IBMS-PHLab/emily/databases/silva-138-99-seqs.qza \
    --vsearch_tax /ceph/work/IBMS-PHLab/emily/databases/silva-138-99-tax.qza \
    --gtdb_db /ceph/work/IBMS-PHLab/emily/databases/GTDB_bac120_arc53_ssu_r207_fullTaxo.fa.gz \
    --refseq_db /ceph/work/IBMS-PHLab/emily/databases/RefSeq_16S_6-11-20_RDPv16_fullTaxo.fa.gz \
    --silva_db /ceph/work/IBMS-PHLab/emily/databases/silva_nr99_v138.1_wSpecies_train_set.fa.gz \
    --run_picrust2

The .nextflow.log file doesn't appear to give new important info, but here is an excerpt nonetheless:

May-15 00:40:59.702 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 5011688; id: 36; name: pb16S:dada2_qc (1); status: COMPLETED; exit: 0; error: -; workDir: /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/9a/6b09bc88fd5b84975cafc3caf95762 started: 1715733659700; exited: 2024-05-15T00:39:45.616046767Z; ]
May-15 00:40:59.736 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process pb16S:dada2_rarefaction (1) > jobId: 5011691; workDir: /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/1c/a05311804446312d205155d56cd43d
May-15 00:40:59.736 [Task submitter] INFO  nextflow.Session - [1c/a05311] Submitted process > pb16S:dada2_rarefaction (1)
May-15 00:40:59.738 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 5011690; id: 35; name: pb16S:class_tax; status: COMPLETED; exit: 0; error: -; workDir: /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/a4/a3cbf10e05ba9ca925d9f639e10d00 started: 1715733659738; exited: 2024-05-15T00:40:46.843045585Z; ]
May-15 00:41:08.737 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process pb16S:qiime2_phylogeny_diversity (1) > jobId: 5011692; workDir: /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/46/948b90aac84f67178f8633939d6ee5
May-15 00:41:08.737 [Task submitter] INFO  nextflow.Session - [46/948b90] Submitted process > pb16S:qiime2_phylogeny_diversity (1)
May-15 00:41:18.759 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process pb16S:barplot (1) > jobId: 5011693; workDir: /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/b9/e9421280a417c49ef9a88636e2666f
May-15 00:41:18.759 [Task submitter] INFO  nextflow.Session - [b9/e94212] Submitted process > pb16S:barplot (1)
May-15 00:41:28.738 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process pb16S:krona_plot > jobId: 5011694; workDir: /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/e7/8630e6169aad3cbd5d51dc0153f18a
May-15 00:41:28.738 [Task submitter] INFO  nextflow.Session - [e7/8630e6] Submitted process > pb16S:krona_plot
May-15 00:42:59.703 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 5011691; id: 38; name: pb16S:dada2_rarefaction (1); status: COMPLETED; exit: 0; error: -; workDir: /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/1c/a05311804446312d205155d56cd43d started: 1715733779701; exited: 2024-05-15T00:41:44.249090275Z; ]
May-15 00:42:59.712 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 5011692; id: 37; name: pb16S:qiime2_phylogeny_diversity (1); status: COMPLETED; exit: 1; error: -; workDir: /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/46/948b90aac84f67178f8633939d6ee5 started: 1715733779711; exited: 2024-05-15T00:41:29.323342245Z; ]
May-15 00:42:59.822 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'pb16S:qiime2_phylogeny_diversity (1)'

Caused by:
  Process `pb16S:qiime2_phylogeny_diversity (1)` terminated with an error exit status (1)

Command executed:

  qiime phylogeny align-to-tree-mafft-fasttree     --p-n-threads 8     --i-sequences dada2-ccs_rep_filtered.qza     --o-alignment mafft_alignment.qza     --o-masked-alignment mafft_alignment_masked.qza     --o-tree phylotree_mafft_unrooted.qza     --o-rooted-tree phylotree_mafft_rooted.qza

  qiime tools export --input-path phylotree_mafft_rooted.qza     --output-path ./
  mv tree.nwk phylotree_mafft_rooted.nwk

  qiime diversity core-metrics-phylogenetic     --p-n-jobs-or-threads 8     --i-phylogeny phylotree_mafft_rooted.qza     --i-table dada2-ccs_table_filtered.qza     --m-metadata-file metadat.tsv     --p-sampling-depth 12954     --output-dir ./core-metrics-diversity

  # Export various matrix for plotting later
  qiime tools export --input-path ./core-metrics-diversity/bray_curtis_distance_matrix.qza     --output-path ./core-metrics-diversity
  mv ./core-metrics-diversity/distance-matrix.tsv     ./core-metrics-diversity/bray_curtis_distance_matrix.tsv
  qiime tools export --input-path ./core-metrics-diversity/weighted_unifrac_distance_matrix.qza     --output-path ./core-metrics-diversity
  mv ./core-metrics-diversity/distance-matrix.tsv     ./core-metrics-diversity/weighted_unifrac_distance_matrix.tsv
  qiime tools export --input-path ./core-metrics-diversity/unweighted_unifrac_distance_matrix.qza     --output-path ./core-metrics-diversity
  mv ./core-metrics-diversity/distance-matrix.tsv     ./core-metrics-diversity/unweighted_unifrac_distance_matrix.tsv

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: DEPRECATED USAGE: Environment variable SINGULARITYENV_TMPDIR will not be supported in the future, use APPTAINERENV_TMPDIR instead
  WARNING: DEPRECATED USAGE: Environment variable SINGULARITYENV_NXF_DEBUG will not be supported in the future, use APPTAINERENV_NXF_DEBUG instead
  Plugin error from phylogeny:

    Command '['mafft', '--preservecase', '--inputorder', '--thread', '8', '/tmp/qiime2/ewissel/data/3444f9ec-e77e-4e59-a3c9-086c7146d788/data/dna-sequences.fasta']' returned non-zero exit status 1.

  Debug info has been saved to /tmp/qiime2-q2cli-err-zef47iwo.log

Work dir:
  /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/46/948b90aac84f67178f8633939d6ee5

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
May-15 00:42:59.826 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `pb16S:qiime2_phylogeny_diversity (1)` terminated with an error exit status (1)
May-15 00:42:59.845 [Task monitor] DEBUG nextflow.Session - The following nodes are still active:
[process] pb16S:export_biom

Let me know if I have forgotten any other information to include here.

proteinosome commented 4 months ago

@ewissel as mentioned, I do not see any section in your nextflow that incorporates what's suggested in #44. You incorporated max_memory and max_cpus but those do not have any effect on individual steps unless you specify it for those steps, i.e. increasing memory requirement for the phylogeny steps.

Can you add these lines to your config file and rerun?

process {
  // more RAM for the diversity job
  withName: qiime2_phylogeny_diversity {
    cpus = 8
    memory = 240.GB
  }
  // more RAM for the report building
  withName: html_rep {
    cpus = 8
    memory = 128.GB
  }
}
ewissel commented 4 months ago

Apologies~ I've updated the config and launched the jobs, will update soon.

ewissel commented 4 months ago

OK the updated config results in the same error / no change, reposting the error message for reference:

WARN: Killing running tasks (1)

executor >  slurm (2)
[93/248653] process > pb16S:write_log                [100%] 1 of 1, cached: 1 ✔
[b1/838c24] process > pb16S:QC_fastq (8)             [100%] 8 of 8, cached: 8 ✔
[10/b74ff4] process > pb16S:cutadapt (1)             [100%] 8 of 8, cached: 8 ✔
[f4/26865e] process > pb16S:QC_fastq_post_trim (8)   [100%] 8 of 8, cached: 8 ✔
[11/5bec2e] process > pb16S:collect_QC               [100%] 1 of 1, cached: 1 ✔
[8e/4dd9ff] process > pb16S:prepare_qiime2_manife... [100%] 1 of 1, cached: 1 ✔
[98/bcf411] process > pb16S:merge_sample_manifest    [100%] 1 of 1, cached: 1 ✔
[5d/721f3d] process > pb16S:import_qiime2 (1)        [100%] 1 of 1, cached: 1 ✔
[00/4d7e32] process > pb16S:demux_summarize (1)      [100%] 1 of 1, cached: 1 ✔
[60/335c1b] process > pb16S:dada2_denoise (1)        [100%] 1 of 1, cached: 1 ✔
[2f/60cfee] process > pb16S:mergeASV                 [100%] 1 of 1, cached: 1 ✔
[f4/b5a9b3] process > pb16S:filter_dada2             [100%] 1 of 1, cached: 1 ✔
[9a/6b09bc] process > pb16S:dada2_qc (1)             [100%] 1 of 1, cached: 1 ✔
[18/fede82] process > pb16S:qiime2_phylogeny_dive... [100%] 1 of 1, failed: 1 ✘
[1c/a05311] process > pb16S:dada2_rarefaction (1)    [100%] 1 of 1, cached: 1 ✔
[a4/a3cbf1] process > pb16S:class_tax                [100%] 1 of 1, cached: 1 ✔
[-        ] process > pb16S:dada2_assignTax          -
[-        ] process > pb16S:export_biom              -
[-        ] process > pb16S:barplot_nb               -
[b9/e94212] process > pb16S:barplot (1)              [100%] 1 of 1, cached: 1 ✔
[-        ] process > pb16S:picrust2                 -
[-        ] process > pb16S:html_rep                 -
[e7/8630e6] process > pb16S:krona_plot               [100%] 1 of 1, cached: 1 ✔
Error executing process > 'pb16S:qiime2_phylogeny_diversity (1)'

Caused by:
  Process `pb16S:qiime2_phylogeny_diversity (1)` terminated with an error exit status (1)

Command executed:

  qiime phylogeny align-to-tree-mafft-fasttree     --p-n-threads 8     --i-sequences dada2-ccs_rep_filtered.qza     --o-alignment mafft_alignment.qza     --o-masked-alignment mafft_alignment_masked.qza     --o-tree phylotree_mafft_unrooted.qza     --o-rooted-tree phylotree_mafft_rooted.qza

  qiime tools export --input-path phylotree_mafft_rooted.qza     --output-path ./
  mv tree.nwk phylotree_mafft_rooted.nwk

  qiime diversity core-metrics-phylogenetic     --p-n-jobs-or-threads 8     --i-phylogeny phylotree_mafft_rooted.qza     --i-table dada2-ccs_table_filtered.qza     --m-metadata-file metadat.tsv     --p-sampling-depth 12954     --output-dir ./core-metrics-diversity

  # Export various matrix for plotting later
  qiime tools export --input-path ./core-metrics-diversity/bray_curtis_distance_matrix.qza     --output-path ./core-metrics-diversity
  mv ./core-metrics-diversity/distance-matrix.tsv     ./core-metrics-diversity/bray_curtis_distance_matrix.tsv
  qiime tools export --input-path ./core-metrics-diversity/weighted_unifrac_distance_matrix.qza     --output-path ./core-metrics-diversity
  mv ./core-metrics-diversity/distance-matrix.tsv     ./core-metrics-diversity/weighted_unifrac_distance_matrix.tsv
  qiime tools export --input-path ./core-metrics-diversity/unweighted_unifrac_distance_matrix.qza     --output-path ./core-metrics-diversity
  mv ./core-metrics-diversity/distance-matrix.tsv     ./core-metrics-diversity/unweighted_unifrac_distance_matrix.tsv

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: DEPRECATED USAGE: Environment variable SINGULARITYENV_TMPDIR will not be supported in the future, use APPTAINERENV_TMPDIR instead
  WARNING: DEPRECATED USAGE: Environment variable SINGULARITYENV_NXF_DEBUG will not be supported in the future, use APPTAINERENV_NXF_DEBUG instead
  Plugin error from phylogeny:

    Command '['mafft', '--preservecase', '--inputorder', '--thread', '8', '/tmp/qiime2/ewissel/data/3444f9ec-e77e-4e59-a3c9-086c7146d788/data/dna-sequences.fasta']' returned non-zero exit status 1.

  Debug info has been saved to /tmp/qiime2-q2cli-err-zbr0zab9.log

Work dir:
  /ceph/sharedfs/work/IBMS-PHLab/emily/BI-ANA-7514/work/18/fede8245004adc52a8d57e0fb2ed2c

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
proteinosome commented 4 months ago

Thanks. Can you check how many ASVs are you getting?

In the results folder you should have a file called ‘ dada2_ASV.fasta’. Count how many sequences are there (grep for the sequence header >).

ewissel commented 4 months ago

Yes, here you go:

$ grep ">"  hifi_processed_may7/dada2/dada2_ASV.fasta  | wc -l
264
proteinosome commented 4 months ago

That's not a lot of sequences and there should not be so much memory needed. By any chance the tmp file exists? /tmp/qiime2-q2cli-err-zbr0zab9.log

ewissel commented 4 months ago

I specify my tmp dir in my config file as /ceph/sharedfs/work/IBMS-PHLab/emily/tmp, so when I look there, I don't see that file but I do see:

$ ls tmp 
qiime2    tmp    tmpb9767yz5

tmp is empty, tmpb9767yz5 has an index.html file and sampID.txt files with taxa assignment info in them. The qiime dir looks like the following, but all folders are empty:

tmp/qiime2/$USER/
    data/
    keys/
    pools/
    processes/ 
    VERSION

The version file contains the following: QIIME 2 cache: 1 framework: 2023.2.0.

Is it possible the tmp/qiime2....log is somewhere buried in the work/ directory?

proteinosome commented 4 months ago

The annoying thing is that QIIME doesn't respect the TMPDIR directive and we have to force the matter here. In your nextflow.config file there should be a section for singularity. Modify it and add runOptions in the block, i.e.:

  singularity {
    singularity.enabled = true
    singularity.autoMounts = true
    singularity.cacheDir = "$HOME/nf_conda/singularity"
    singularity.runOptions = "--bind /ceph/sharedfs/work/IBMS-PHLab/emily/tmp:/tmp"
    params.enable_container=true
    docker.enabled = false
    podman.enabled = false
    shifter.enabled = false
    charliecloud.enabled = false
  }

And rerun it. When it fails you should have the error file in /ceph/sharedfs/work/IBMS-PHLab/emily/tmp hopefully.

I apologize this all seems very convoluted but QIIME 2 has always had this issue where it's hard to debug... Thanks for being patient.

ewissel commented 4 months ago

Thank you so much for the help!! I couldn't figure this one out on my own.

Should I add these to the ammended_config?

proteinosome commented 4 months ago

In the folder where your main.nf is (the repository), you should see a nextflow.config file. In that file there is already a singularity section.

ewissel commented 4 months ago

OK updated the nextflow.config with the singularity.runOptions bit and still produced the same error.

proteinosome commented 4 months ago

The config is not supposed to fix the error, it merely changes the tmpdir so that you can find the error messages. I'm looking for the error messages produced by QIIME 2.

Are you now seeing any error log files in /ceph/sharedfs/work/IBMS-PHLab/emily/tmp?

ewissel commented 4 months ago

my b that's right.

$ ls /ceph/sharedfs/work/IBMS-PHLab/emily/tmp/
qiime2  qiime2-q2cli-err-frdo44nj.log  qiime2-q2cli-err-jbv760k0.log  Rtmp4dWatk  RtmpugcAzJ  tmp  tmpb9767yz5

File qiime2-q2cli-err-frdo44nj.log contains the following:

$ cat /ceph/sharedfs/work/IBMS-PHLab/emily/tmp/qiime2-q2cli-err-frdo44nj.log
mktemp: failed to create directory via template ‘/ceph/work/IBMS-PHLab/emily/tmp/mafft.XXXXXXXXXX’: No such file or directory
mktemp seems to be obsolete. Re-trying without -t
mkdir: cannot create directory ‘/ceph/work’: Read-only file system
mktemp: failed to create directory via template ‘/ceph/work/IBMS-PHLab/emily/tmp/tmp/mafft.XXXXXXXXXX’: No such file or directory
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1121: /infile: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1122: /infile: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1123: /_addfile: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1131: /infile: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1133: /_aamtx: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1134: /_subalignmentstable: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1135: /_guidetree: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1136: /_codonpos: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1137: /_codonscore: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1138: /_seedtablefile: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1139: /_lara.params: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1140: /pdblist: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1141: /ownlist: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1142: /_externalanchors: Read-only file system
grep: /infile: No such file or directory
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1817: [: -gt: unary operator expected
grep: /infile: No such file or directory
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1826: [: -eq: unary operator expected
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1833: [: too many arguments
mv: cannot stat 'infile': No such file or directory
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: mafft --preservecase --inputorder --thread 8 /tmp/qiime2/ewissel/data/3444f9ec-e77e-4e59-a3c9-086c7146d788/data/dna-sequences.fasta

Traceback (most recent call last):
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2cli/commands.py", line 352, in __call__
    results = action(**arguments)
  File "<decorator-gen-427>", line 2, in align_to_tree_mafft_fasttree
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 475, in _callable_executor_
    outputs = self._callable(scope.ctx, **view_args)
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_phylogeny/_align_to_tree_mafft_fasttree.py", line 19, in align_to_tree_mafft_fasttree
    aligned_seq, = mafft(sequences=sequences, n_threads=n_threads,
  File "<decorator-gen-488>", line 2, in mafft
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 128, in mafft
    return _mafft(sequences_fp, None, n_threads, parttree, False)
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 100, in _mafft
    run_command(cmd, result_fp)
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 26, in run_command
    subprocess.run(cmd, stdout=output_f, check=True)
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mafft', '--preservecase', '--inputorder', '--thread', '8', '/tmp/qiime2/ewissel/data/3444f9ec-e77e-4e59-a3c9-086c7146d788/data/dna-sequences.fasta']' returned non-zero exit status 1.

And the other log file has the following:

$ cat /ceph/sharedfs/work/IBMS-PHLab/emily/tmp/qiime2-q2cli-err-jbv760k0.log
mktemp: failed to create directory via template ‘/ceph/work/IBMS-PHLab/emily/tmp/mafft.XXXXXXXXXX’: No such file or directory
mktemp seems to be obsolete. Re-trying without -t
mkdir: cannot create directory ‘/ceph/work’: Read-only file system
mktemp: failed to create directory via template ‘/ceph/work/IBMS-PHLab/emily/tmp/tmp/mafft.XXXXXXXXXX’: No such file or directory
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1121: /infile: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1122: /infile: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1123: /_addfile: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1131: /infile: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1133: /_aamtx: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1134: /_subalignmentstable: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1135: /_guidetree: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1136: /_codonpos: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1137: /_codonscore: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1138: /_seedtablefile: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1139: /_lara.params: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1140: /pdblist: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1141: /ownlist: Read-only file system
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1142: /_externalanchors: Read-only file system
grep: /infile: No such file or directory
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1817: [: -gt: unary operator expected
grep: /infile: No such file or directory
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1826: [: -eq: unary operator expected
/opt/conda/envs/qiime2-2023.2/bin/mafft: line 1833: [: too many arguments
mv: cannot stat 'infile': No such file or directory
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: mafft --preservecase --inputorder --thread 8 /tmp/qiime2/ewissel/data/3444f9ec-e77e-4e59-a3c9-086c7146d788/data/dna-sequences.fasta

Traceback (most recent call last):
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2cli/commands.py", line 352, in __call__
    results = action(**arguments)
  File "<decorator-gen-427>", line 2, in align_to_tree_mafft_fasttree
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 475, in _callable_executor_
    outputs = self._callable(scope.ctx, **view_args)
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_phylogeny/_align_to_tree_mafft_fasttree.py", line 19, in align_to_tree_mafft_fasttree
    aligned_seq, = mafft(sequences=sequences, n_threads=n_threads,
  File "<decorator-gen-488>", line 2, in mafft
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 128, in mafft
    return _mafft(sequences_fp, None, n_threads, parttree, False)
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 100, in _mafft
    run_command(cmd, result_fp)
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 26, in run_command
    subprocess.run(cmd, stdout=output_f, check=True)
  File "/opt/conda/envs/qiime2-2023.2/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mafft', '--preservecase', '--inputorder', '--thread', '8', '/tmp/qiime2/ewissel/data/3444f9ec-e77e-4e59-a3c9-086c7146d788/data/dna-sequences.fasta']' returned non-zero exit status 1.
proteinosome commented 4 months ago

Sorry I forgot that you set TMPDIR in your nextflow config. Can you change TMPDIR = '/ceph/work/IBMS-PHLab/emily/tmp' to TMPDIR='/tmp' in your config file? By letting singularity bind the TMPDIR directly you don't need to set it manually.

ewissel commented 4 months ago

OK this solves the pb16s:qiime2_phylogeny_diversity error! So the problem the whole time was the tmp dir not being /tmp ?

proteinosome commented 4 months ago

Phew, that was a tough one. The issue came from two problems:

  1. I believe your cluster has a very limited (in size) /tmp folder by default, which is why you were exporting TMPDIR manually, right?
  2. Exporting TMPDIR doesn't work with QIIME, at least not straightforwardly. Even when you export TMPDIR, QIIME is still stubbornly trying to use /tmp, and this causes failure.

By using the singularity.runOptions option, we are forcing the /tmp directory inside the container to be /ceph/work/IBMS-PHLab/emily/tmp, so when QIIME write to /tmp (See problem 2 above), it's really writing to /ceph/work/IBMS-PHLab/emily/tmp.

You can also safely remove the options added based on #44, as those are really only likely needed when you have a huge amount of ASVs (environmental samples).

If the pipeline runs through fine please let me know so I can close this issue.

ewissel commented 4 months ago

You're right that there are some issues with space:user allocations on our HPC.

Only one more issue but not directly related - the final html won't compile due to a disk usage error, but the workflow is trying to load this into the wrong directory (something on dicos which is $HOME, while I need everything running to be on /ceph/). I thought I specified everything in my config that I needed to for this to compile on /ceph/ - any recommendations?

[e7/8630e6] process > pb16S:krona_plot               [100%] 1 of 1, cached: 1 ✔
Error executing process > 'pb16S:html_rep (1)'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name kpinpb-pb-16s-vis-latest.img.pulling.1715759304337 docker://kpinpb/pb-16s-vis:latest > /dev/null
  status : 255
  message:
    INFO:    Converting OCI blobs to SIF format
    INFO:    Starting build...
    Getting image source signatures
    Copying blob sha256:9dd3a2bb9cdfb0be8969ffb6a9aa14729f2d65bcad26227bff9fabdf97a0944c
    Copying blob sha256:a6236801494d5ca9acfae6569427398c2942c031375b96cac887cafe1de5a09b
    Copying blob sha256:852e50cd189dfeb54d97680d9fa6bed21a6d7d18cfb56d6abfe2de9d7f173795
    Copying blob sha256:6540475d41a8f3ad22478707f2f4a43433665bd0e3fadbb06386a48b39ab0d2e
    Copying blob sha256:aa36e8c7bbae27fdc6eec2d0b8aa52faa9312f1e911bc41de7a72cf5b884cd4e
    Copying blob sha256:679c171d6942954a759f2d3a1dff911321940f23b0cdbe1d186f36bef0025124
    Copying blob sha256:97b7285328a32a05ef82fcbfff8377fec5b00adf6df07ec8cf62083661e5ea25
    Copying blob sha256:f45fe9aee3c8fca81cc3160de68d546317a82f755a49359d6ef18529bbc0bf4e
    Copying blob sha256:1d894546a9b7f84aa65073bff4c9182d6f8620d2582913e3147a94a29e1907d6
    Copying blob sha256:1417ad815d72958552e047548cb5a746af5a2bc2f0fefcf21999eea00305505e
    Copying blob sha256:9c6f3f0a2dda45ee2ceb5ffc45dd0f2293db48fe11d7aabd76ce35cf28d8580f
    Copying blob sha256:f3d818348caae61fb25dc1b08448e14faa5f339e8d8e1bda530d19ab83bcb7c1
    Copying config sha256:9bdd96c291ffa09458631b1b4f0639d54fef9cdf249412d4955303b1edf57d0d
    Writing manifest to image destination
    Storing signatures
    2024/05/15 07:48:41  info unpack layer: sha256:852e50cd189dfeb54d97680d9fa6bed21a6d7d18cfb56d6abfe2de9d7f173795
    2024/05/15 07:48:43  info unpack layer: sha256:a6236801494d5ca9acfae6569427398c2942c031375b96cac887cafe1de5a09b
    2024/05/15 07:48:47  info unpack layer: sha256:679c171d6942954a759f2d3a1dff911321940f23b0cdbe1d186f36bef0025124
    2024/05/15 07:48:50  info unpack layer: sha256:6540475d41a8f3ad22478707f2f4a43433665bd0e3fadbb06386a48b39ab0d2e
    2024/05/15 07:48:50  info unpack layer: sha256:aa36e8c7bbae27fdc6eec2d0b8aa52faa9312f1e911bc41de7a72cf5b884cd4e
    2024/05/15 07:48:50  info unpack layer: sha256:9dd3a2bb9cdfb0be8969ffb6a9aa14729f2d65bcad26227bff9fabdf97a0944c
    2024/05/15 07:48:50  info unpack layer: sha256:97b7285328a32a05ef82fcbfff8377fec5b00adf6df07ec8cf62083661e5ea25
    2024/05/15 07:48:50  info unpack layer: sha256:f45fe9aee3c8fca81cc3160de68d546317a82f755a49359d6ef18529bbc0bf4e
    2024/05/15 07:48:50  info unpack layer: sha256:1d894546a9b7f84aa65073bff4c9182d6f8620d2582913e3147a94a29e1907d6
    2024/05/15 07:48:54  info unpack layer: sha256:1417ad815d72958552e047548cb5a746af5a2bc2f0fefcf21999eea00305505e
    2024/05/15 07:49:24  info unpack layer: sha256:9c6f3f0a2dda45ee2ceb5ffc45dd0f2293db48fe11d7aabd76ce35cf28d8580f
    2024/05/15 07:49:24  info unpack layer: sha256:f3d818348caae61fb25dc1b08448e14faa5f339e8d8e1bda530d19ab83bcb7c1
    INFO:    Creating SIF file...
    FATAL:   While making image from oci registry: error fetching image to cache: while building SIF from layers: while creating SIF: while unloading container: close /dicos_ui_home/ewissel/.apptainer/cache/oci-tmp/tmp_3980167945: disk quota exceeded
proteinosome commented 4 months ago

That error is with singularity (or apptainer in your case) not having enough space pulling down the container image required to run the step. Singularity usually pulls to a tmp space, then compile it to the final destination (your nf_conda folder). I don't think there's anyway to change that tmp space within Nextflow config itself, and you will have to ask your IT to give you more quota in the apptainer tmp space.

ewissel commented 4 months ago

Then I think we are good to go here. Thank you so much for the help in troubleshooting this.