NBISweden / pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
GNU General Public License v3.0
43 stars 18 forks source link

Requested node configuration is not available in transcript_assembly #81

Closed fraca closed 1 year ago

fraca commented 1 year ago

Hi,

I'm trying to run the subworkflow transcript_assembly on rackham (Uppmax). Here the command that I run:

screen -S anno_tra_ass
module load bioinfo-tools Nextflow
export NXF_HOME=/proj/uppstore2019057/private/program_MF/nextflow_home
export NXF_LAUNCHER=$SNIC_TMP
export NXF_TEMP=$SNIC_TMP
export NXF_SINGULARITY_CACHEDIR=/proj/uppstore2019057/nobackup/pro_next/w_tra_ass
nextflow run -profile uppmax -params-file /proj/uppstore2019057/nobackup/pro_next/para_tra_ass.yml /proj/uppstore2019057/private/program_MF/pipelines-nextflow/main.nf

This is the yml file:

subworkflow: 'transcript_assembly'
reads: '/proj/uppstore2019057/nobackup/pro_next/reads_Ltri/TI*_{R1,R2}_001.fastq.gz'
genome: '/proj/uppstore2019057/private/Linum_ref/L_trigynum_pilon2.fasta'
single_end: false
outdir: '/proj/uppstore2019057/nobackup/pro_next/ris_tra_ass/'
project: 'snic2022-22-696'

This is the error that I got:

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  sbatch .command.run

Command exit status:
  1

Command output:
  sbatch: error: CPU count per node can not be satisfied
  sbatch: error: Batch job submission failed: Requested node configuration is not available

Work dir:
  /crex/proj/uppstore2019057/nobackup/pro_next/w_tra_ass/work/e3/dbf852ba17bf670d7ce2474e4e02b5

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh

and this the above mentioned .command.sh:

#!/bin/bash -ue
mkdir hisat2
hisat2-build \
    -p 12 \
     \
    L_trigynum_pilon2.fasta \
    hisat2/L_trigynum_pilon2

cat <<-END_VERSIONS > versions.yml
"TRANSCRIPT_ASSEMBLY:HISAT2_BUILD":
    hisat2: 2.2.0
END_VERSIONS

I don't understand why it complains about the CPU per node. If I understand correctly hisat2 it require 12 core and it should not be a problem for a rackham node. Best,

Marco

mahesh-panchal commented 1 year ago

Hi Marco, Could you go to the work directory listed there and copy the SBATCH header here in the hidden file .command.run.

fraca commented 1 year ago

Hi Mahesh,

here the header:

#!/bin/bash
#SBATCH -D /crex/proj/uppstore2019057/nobackup/pro_next/w_tra_ass/work/e3/dbf852ba17bf670d7ce2474e4e02b5
#SBATCH -J nf-TRANSCRIPT_ASSEMBLY_HISAT2_BUILD_(L_trigynum_pilon2.fasta)
#SBATCH -o /crex/proj/uppstore2019057/nobackup/pro_next/w_tra_ass/work/e3/dbf852ba17bf670d7ce2474e4e02b5/.command.log
#SBATCH --no-requeue
#SBATCH --signal B:USR2@30
#SBATCH -c 12
#SBATCH -t 16:00:00
#SBATCH --mem 204800M
#SBATCH -A snic2022-22-696
# NEXTFLOW TASK: TRANSCRIPT_ASSEMBLY:HISAT2_BUILD (L_trigynum_pilon2.fasta)

Maybe the problem is the memory requested.

mahesh-panchal commented 1 year ago

I think the memory is the issue here. If you're able, you can reduce the memory with a custom config.

process {
    withName: 'HISAT2_BUILD' {
        memory = 120.GB
    }
}

In the mean time, I'll reduce the high memory value of label.

fraca commented 1 year ago

Thanks Mahesh, I added the config file but there is another error:

  cat <<-END_VERSIONS > versions.yml
  "TRANSCRIPT_ASSEMBLY:FASTP":
      fastp: $(fastp --version 2>&1 | sed -e "s/fastp //g")
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  WARNING: skipping mount of /lutra: stat /lutra: transport endpoint is not connected
  FATAL:   container creation failed: mount /lutra->/lutra error: while mounting /lutra: while getting stat for /lutra: stat /lutra: transport endpoint is not connected

Work dir:
  /proj/uppstore2019057/nobackup/pro_next/w_tra_ass/91/44c3a73bad3943d5ebc32df31732ea

The problem is related with lutra. In our group we have offload archive storage but I don't use data there. Currently there is an uppmax issue relate to it (https://status.uppmax.uu.se/2022-10-10/lutra-problem/). I don't understand why nextflow is trying to access it.

mahesh-panchal commented 1 year ago

I think it's singularity that's trying to mount it. Send a support request to uppmax support. You many need to pass containerOptions to Nextflow to prevent Singularity mounting that volume.

mahesh-panchal commented 1 year ago

Response from my ticket to Uppmax:

The problem is not the Singularity "bind path" Singularity config in itself, but that the /lutra path on some nodes was still not working, due to a hanged mount. We tried to unmount Lutra everywhere yesterday, but apparently a subset of Rackham nodes was missed.

I have unmounted the broken fs on these nodes now. With Lutra unmounted, /lutra is just a empty directory, and does not cause problems starting Singularity.

Please retry the workflow and let me know how it goes.

fraca commented 1 year ago

Hi Mahesh, I restart it now and it seems to work. I let you known if I will get other errors.