SystemsGenetics / GEMmaker

A workflow for construction of Gene Expression count Matrices (GEMs). Useful for Differential Gene Expression (DGE) analysis and Gene Co-Expression Network (GCN) construction
https://gemmaker.readthedocs.io/en/latest/
MIT License
33 stars 16 forks source link

GEMmaker not running #281

Open ldunwoodie opened 10 months ago

ldunwoodie commented 10 months ago

Update 2023.08.23

I worked with our Linux team here to add a proxy to my batch job executable, which seemed to remove the "WARNING: Could not load nf-core/config profiles: https://raw.githubusercontent.com/nf-core/configs/master/nfcore_custom.config" message, but I'm still getting the ".command.sh: line 5: fastqc: command not found" error. Any help would be much appreciated! Thanks.

Description of the bug

Hey y'all! I'm an alum of the Feltus lab using GEMmaker for the first time during my residency in pediatrics. Launching GEMmaker, I received an error and I was hoping you all could help. I'm using GEMmaker on an LSF cluster. Any help would be much appreciated! Thanks! -Leland

Command used and terminal output

Here are my errors:
WARNING: Could not load nf-core/config profiles: https://raw.githubusercontent.com/nf-core/configs/master/nfcore_custom.config
/users/dunjh6/Documents/Lautz-Research/Sepsis-Genomics/work/5a/fd2f1c7a0756b4c5311604c4213c05/.command.sh: line 5: fastqc: command not found

Relevant files

Here is my batch job file:

BSUB -L /bin/bash

BSUB -W 336:00

BSUB -n 16

BSUB -M 128000

BSUB -e /users/dunjh6/Documents/Lautz-Research/Sepsis-Genomics/GEMmaker.%J.err

BSUB -J GEMmaker

module load java singularity nextflow cd /users/dunjh6/Documents/Lautz-Research/Sepsis-Genomics

nextflow run systemsgenetics/gemmaker -profile singularity \ --pipeline kallisto \ --kallisto_index_path Homo_sapiens.GRCh38.cdna.all.kallisto.indexed \ --input "/data/atreya-lab/Data Share EMORY/CCHMC Transcriptomics/Bulk mRNA seq/Day1FinalFastqFiles/*{1,2}.fastq"

And here is my nextflow.config file

profiles { my_cluster { process { executor = "LSF" queue = "normal" clusterOptions = "-L /bin/bash -W 336:00 -n 16 -M 128000 -e /users/dunjh6/Documents/Lautz-Research/Sepsis-Genomics/GEMmaker.%J.err -J GEMmaker" } executor { queueSize = 120 } } }

System information

LSF executor Using HPC Using Singularity Using Linux

ldunwoodie commented 10 months ago

It looks like my jobs are running for the moment! I needed to load fastqc and kallisto as modules as well as add proxy statements to my batch job script so I could access the custom nf config files using the HPC:

export http_proxy=http://username:password@proxy.com:port export https_proxy=http://username:password@proxy.com:port

module load java singularity nextflow fastqc/0.11.7 kallisto

spficklin commented 10 months ago

Hi @ldunwoodie! I've never seen that error before about not being able to retrieve nf-core files. If you need a proxy it sounds like you must have some sort of firewall that prevents the node the job is running on from accessing the wider internet?

You shouldn't need to add modules for fastqc or kallisto if you are using singularity. All of the necessary software is contained in Docker images and GEMmaker will automatically download those images and use them via singularity. Also, I noticed that even though you have a nextflow.config file you aren't telling nextflow to use the my_cluster profile. I think you use the following in your script:

#BSUB -L /bin/bash
#BSUB -W 336:00
#BSUB -n 16
#BSUB -M 128000
#BSUB -e /users/dunjh6/Documents/Lautz-Research/Sepsis-Genomics/GEMmaker.%J.err
#BSUB -J GEMmaker

module load java singularity nextflow
cd /users/dunjh6/Documents/Lautz-Research/Sepsis-Genomics

nextflow run systemsgenetics/gemmaker -profile my_cluster,singularity \
  --pipeline kallisto \
  --kallisto_index_path Homo_sapiens.GRCh38.cdna.all.kallisto.indexed \
  --input "/data/atreya-lab/Data Share EMORY/CCHMC Transcriptomics/Bulk mRNA seq/Day1FinalFastqFiles/*{1,2}.fastq"