fastcat: unrecognized option '--histograms' and csvtk: command not found #79

hgingras opened 3 months ago

hgingras commented 3 months ago

Operating System

Other Linux (please specify below)

Other Linux

NAME="Rocky Linux"

Workflow Version


Workflow Execution

Other (please describe)

EPI2ME Version

No response

CLI command run


SBATCH --account=def-user

SBATCH --cpus-per-task=16

SBATCH --mem=32G

SBATCH --time=0-01:00

module load StdEnv/2023 apptainer/1.2.4 nextflow/23.10.0

export NXF_SINGULARITY_CACHEDIR="/scratch/$USER/apptainer/cache" export APPTAINER_TMPDIR="/scratch/$USER/apptainer/tmp" export APPTAINER_BIND="/lustre05,/lustre06,/lustre07,/scratch,/project"

nextflow run main.nf \ --fastq /home/helene/scratch/Ticket/wf-transcriptomes/wf-transcriptomes-1.1.1-no-1/differential_expression/differential_expression_fastq \ --de_analysis --ref_genome /home/helene/scratch/Ticket/wf-transcriptomes/wf-transcriptomes-1.1.1-no-1/differential_expression/hg38_chr20.fa \ --transcriptome-source reference-guided \ --ref_annotation /home/helene/scratch/Ticket/wf-transcriptomes/wf-transcriptomes-1.1.1-no-1/differential_expression/gencode.v22.annotation.chr20.gtf \ --direct_rna --minimap2_index_opts '-k 15' --sample_sheet /home/helene/scratch/Ticket/wf-transcriptomes/wf-transcriptomes-1.1.1-no-1/differential_expression/sample_sheet.csv \ --jaffal_refBase /home/helene/scratch/Ticket/wf-transcriptomes/wf-transcriptomes-1.1.1-no-1/differential_expression/chr20/ --jaffal_genome hg38_chr20 --jaffal_annotation genCode22 \ --out_dir Test-7 \ -profile singularity

Workflow Execution - CLI Execution Profile


What happened?

I have 2 errors to report.

1st error: fastcat: unrecognized option '--histograms' see output log

Fastcat v0.16.0 has the --histograms option. In the wf-transcriptomes_latest.sif image that I am using when looking for the fastcat version installed I get version 0.10.2:

apptainer run wf-transcriptomes_latest.sif Apptainer> pwd /home/epi2melabs/conda/bin Apptainer> fastcat -V 0.10.2

With « fastcat --help » command I do not see option '--histograms’, neither when I installed from source fastcat version 0.10.2. At least present in version 0.16.0.

Is it possible to update the image with version 0.16.0 for fastcat?

The error is in lib/ingress.nf file:

process fastcat { label "ingress" label "wf_common" cpus 3 memory "2 GB" input: tuple val(meta), path("input") val extra_args output: tuple val(meta), path("seqs.fastq.gz"), path("fastcat_stats") script: String out = "seqs.fastq.gz" String fastcat_stats_outdir = "fastcat_stats" """ mkdir $fastcat_stats_outdir fastcat \ -s ${meta["alias"]} \ -r >(bgzip -c > $fastcat_stats_outdir/per-read-stats.tsv.gz) \ -f $fastcat_stats_outdir/per-file-stats.tsv \ --histograms histograms \ $extra_args \ input \ | bgzip > $out mv histograms/* $fastcat_stats_outdir

extract the run IDs from the per-read stats

    csvtk cut -tf runid $fastcat_stats_outdir/per-read-stats.tsv.gz \
    | csvtk del-header | sort | uniq > $fastcat_stats_outdir/run_ids

2nd error: Not in output log.

This error happened when I removed --histograms option in lib/ingress.nf file to see what was going on. csvtk: command not found

In image wf-transcriptomes_latest.sif :

apptainer run wf-transcriptomes_latest.sif Apptainer> pwd /home/epi2melabs/conda/bin Apptainer> ls

I do not see that it is installed… not in the list. https://github.com/shenwei356/csvtk

Thanks for your help.

Relevant log output

Some of the pipeline process did workout. This is the end of output for slurm:


Were you able to successfully run the latest version of the workflow with the demo data?


Other demo data information

This is what I am trying to do.
sarahjeeeze commented 3 months ago

Hi, you can run the cmd like nextflow run epi2me-labs/wf-transcriptomes instead of nextflow run main.nf to ensure it runs with the correct version of the container and let me know if you still get the same errors

hgingras commented 3 months ago

Hi sarahjeeeze,

When running : nextflow run epi2me-labs/wf-transcriptomes --help

I get this error:

There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (malloc) failed to allocate 24 bytes for AllocateHeap

Our environment is an HPC system where there is limited memory on the login node. Also, we have only access to internet on the login node, so I cannot run in the compute node to get the workflow.

So I am limited to download the workflow this way:

wget https://github.com/epi2me-labs/wf-transcriptomes/archive/refs/tags/v1.1.1.tar.gz tar -xvf v1.1.1.tar.gz

In this version (the last one), in the nextflow.config there is this specification: container_sha = "shae7c9f184996a384e99be68e790f0612f0c732867"

This is the image I loaded doing so:

module load StdEnv/2023 apptainer/1.2.4 nextflow/23.10.0 mkdir -p /scratch/$USER/apptainer/{cache,tmp} export APPTAINER_CACHEDIR="/scratch/$USER/apptainer/cache" export APPTAINER_TMPDIR="/scratch/$USER/apptainer/tmp"

apptainer pull docker://ontresearch/wf-transcriptomes:shae7c9f184996a384e99be68e790f0612f0c732867

As mention in previous message when looking in this .sif image with this command, I see that fastcat version is an old one that do not have the --histograms option that is specified in the lib/ingress.nf file in fastcat process.

apptainer run wf-transcriptomes_shae7c9f184996a384e99be68e790f0612f0c732867.sif Apptainer> pwd /home/epi2melabs/conda/bin Apptainer> fastcat -V 0.10.2

got same with image wf-transcriptomes_latest.sif

I tried to run with old version of wf-transcriptomes-1.0.0 where the lib/ingress.nf do not have the --histograms option but then I got that csvtk: command not found. Here I do not see csvtk in conda environment.

Could you have a look on your side at the version of fastcat that is available in the last .sif image that you provide?

It should be upgraded to 0.16.0.

Could you also add csvtk in the conda environment?

Best regards,


apaul7 commented 2 months ago

Hi, I ran into a similar issue where fastcat did not have the --histograms option. I also copy the git repo and run the main.nf file. I use docker instead of apptainer. I run the pipeline on a local LSF cluster instead of slurm.

For running nextflow workflows I create a profile that can utilize the LSF cluster. I set the default docker image to ontresearch/wf-transcriptomes:${params.wf.container_sha} then I ran into the --historgrams issue you faced. I was able solve this problem by utilizing the labels for each step.

I added this to the nextflow.config:

profiles {
    // the "standard" profile is used implicitely by nextflow
    // if no other profile is given on the CLI
    compute1_lsf {
        process.executor = 'lsf'
        process.queue = 'general'
        process {
            withLabel:isoforms {
                clusterOptions =  "-G compute-mylab -a 'docker(ontresearch/wf-transcriptomes:${params.wf.container_sha})'"
            withLabel:wf_common {
                clusterOptions =  "-G compute-mylab -a 'docker(ontresearch/wf-common:${params.wf.common_sha})'"

then added -profile compute1_lsf to the nextflow run command. Not sure how to do that for your slurm cluster or utilizing apptainers. Just wanted to hopefully offer a solution!

hgingras commented 2 months ago

Thanks for sharing. I ended up using local mode and setting up the requirements in a python virtual environment and installing other modules by myself. Only the jaffal module I could not set up. I wished I had a reply to understand more about the docker image and version of the different modules. Have a good one!

Yoon90 commented 1 month ago

Hi, I ran into a similar issue where fastcat did not have the --histograms option. I also copy the git repo and run the main.nf file. I use docker instead of apptainer. I run the pipeline on a local LSF cluster instead of slurm.

For running nextflow workflows I create a profile that can utilize the LSF cluster. I set the default docker image to ontresearch/wf-transcriptomes:${params.wf.container_sha} then I ran into the --historgrams issue you faced. I was able solve this problem by utilizing the labels for each step.

I added this to the nextflow.config:

profiles {
    // the "standard" profile is used implicitely by nextflow
    // if no other profile is given on the CLI
    compute1_lsf {
        process.executor = 'lsf'
        process.queue = 'general'
        process {
            withLabel:isoforms {
                clusterOptions =  "-G compute-mylab -a 'docker(ontresearch/wf-transcriptomes:${params.wf.container_sha})'"
            withLabel:wf_common {
                clusterOptions =  "-G compute-mylab -a 'docker(ontresearch/wf-common:${params.wf.common_sha})'"

then added -profile compute1_lsf to the nextflow run command. Not sure how to do that for your slurm cluster or utilizing apptainers. Just wanted to hopefully offer a solution!

@apaul7, Thank you for sharing your case. Could you tell me what's the difference between default and your profile? I could not find any particular changes between your profile and default configuration.

apaul7 commented 1 month ago

Hi, I've added the git diff from 999fb4e using git diff nextflow.config:

index 4eb4c73..8e5874b 100644
--- a/nextflow.config
+++ b/nextflow.config
@@ -140,6 +140,18 @@ process {
 profiles {
     // the "standard" profile is used implicitely by nextflow
     // if no other profile is given on the CLI
+    compute1_lsf {
+        process.executor = 'lsf'
+        process.queue = 'general'
+        process {
+            withLabel:isoforms {
+                clusterOptions =  "-G compute-mylab -a 'docker(ontresearch/wf-transcriptomes:${params.wf.container_sha})'"
+            }
+            withLabel:wf_common {
+                clusterOptions =  "-G compute-mylab -a 'docker(ontresearch/wf-common:${params.wf.common_sha})'"
+            }
+        }
+    }
     standard {
         docker {
             enabled = true

When submitting jobs to my cluster via bsub you need to provide a docker image using the application(-a) option. This compute1_lsf profile allows nextflow to use different docker images depending on the label in the individual step.

Hope this helps! -Alex

cjw85 commented 1 month ago


When you were doing this:

In this version (the last one), in the nextflow.config there is this specification: container_sha = "shae7c9f184996a384e99be68e790f0612f0c732867"

This is the image I loaded doing so:

you would have needed to pull another container image also. The workflow use two images: nextflow.config, it is the wf-common image which is used to run the steps involving fastcat (I'm not entirely sure why fastcat is installed into the wf-transcriptomes image also, it might be historical).