epi2me-labs / wf-transcriptomes

Other
64 stars 30 forks source link

running wf-transcriptomes on HPC using sbatch #43

Closed cea295933 closed 7 months ago

cea295933 commented 7 months ago

Ask away!

Hi, I'm trying to run wf-transcriptomes on our HPC and have questions about the best way to take advantage of multiple cpus and multiple threads. I had been using the --ntasks-per-node option but have switched to the --cpues-per-task option. Does this make a meaningful difference and/or should I use them together? We also have multiple nodes. Would it be helpful to request more than one node? For reference, one node has 2 sockets, 32 CPUs per socket, and 2 threads per CPU core. That gives 128 CPUs per node. Can wf-transcriptomes take advantage of all of this, and if so, what is the best way to do so? I am attaching below two separate sbatch scripts. One requests 1 node and 64 cpus-per-task, whereas the other simply requests --exclusive and --mem=MaxMemPerNode

Thanks!

sbatch script one

!/bin/bash

SBATCH -J Aitken_epi2me_20231110_poly

SBATCH -o Aitken_epi2me_20231110_poly.out

SBATCH --nodes=1

SBATCH --cpus-per-task=64

SBATCH -p emc

cd /work/caitken/epi2me-labs

./nextflow run epi2me-labs/wf-transcriptomes -with-trace \ --fastq /work/caitken/data/DegronNanoporeSequencing/Poly \ --ref_genome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.fa \ --ref_annotation /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.gff \ --transcriptome_source reference-guided \ --ref_transcriptome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3_coding.fa \ --de_analysis \ --sample_sheet /work/caitken/data/DegronNanoporeSequencing/BarcodesPoly.csv \ --out_dir /work/caitken/data/DegronNanoporeSequencing/outputPoly \ -c /work/caitken/data/DegronNanoporeSequencing/my_config.cfg

sbatch script 2

!/bin/bash

SBATCH -J Aitken_epi2me_20231110_total

SBATCH -o Aitken_epi2me_20231110_total.out

SBATCH --nodes=1

SBATCH --exclusive

SBATCH --mem=MaxMemPerNode

SBATCH -p emc

cd /work/caitken/epi2me-labs

./nextflow run epi2me-labs/wf-transcriptomes -with-trace \ --fastq /work/caitken/data/DegronNanoporeSequencing/Total \ --ref_genome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.fa \ --ref_annotation /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.gff \ --transcriptome_source reference-guided \ --ref_transcriptome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3_coding.fa \ --de_analysis \ --sample_sheet /work/caitken/data/DegronNanoporeSequencing/BarcodesTotal.csv \ --out_dir /work/caitken/data/DegronNanoporeSequencing/outputTotal \ -c /work/caitken/data/DegronNanoporeSequencing/my_config.cfg

cea295933 commented 7 months ago

circling back ... I can get this to run but it crashes during the DE analysis. I receive an error saying the Salmon needs to be upgraded:

ERROR ~ Error executing process > 'pipeline:differential_expression:count_transcripts (1)'

Caused by: Process pipeline:differential_expression:count_transcripts (1) terminated with an error exit status (1)

Command executed:

salmon quant --noErrorModel -p "4" -t "ammended.ref_transcriptome" -l SF -a "WTpoly1_reads_aln_sorted.bam" -o counts mv counts/quant.sf "WTpoly1.transcript_counts.tsv" seqkit bam "WTpoly1_reads_aln_sorted.bam" 2> "WTpoly1.seqkit.stats"

Command exit status: 1

Command output: (empty)

Command error: Version Info: ### PLEASE UPGRADE SALMON ###

A newer version of salmon with important bug fixes and improvements is available.

The newest version, available at https://github.com/COMBINE-lab/salmon/releases contains new features, improvements, and bug fixes; please upgrade at your earliest convenience.

Sign up for the salmon mailing list to hear about new versions, features and updates at: https://oceangenomics.com/subscribe

salmon (alignment-based) v1.9.0

[ program ] => salmon

[ command ] => quant

[ noErrorModel ] => { }

[ threads ] => { 4 }

[ targets ] => { ammended.ref_transcriptome }

[ libType ] => { SF }

[ alignments ] => { WTpoly1_reads_aln_sorted.bam }

[ output ] => { counts }

Logs will be written to counts/logs [2023-11-10 20:38:24.291] [jointLog] [info] setting maxHashResizeThreads to 4 [2023-11-10 20:38:24.291] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored. Library format { type:single end, relative orientation:none, strandedness:sense } [2023-11-10 20:38:24.293] [jointLog] [info] numQuantThreads = 2 parseThreads = 2 Checking that provided alignment files have consistent headers . . . done Populating targets from aln = "WTpoly1_reads_aln_sorted.bam", fasta = "ammended.ref_transcriptome" . . .done

cea295933 commented 7 months ago

update: I can now get this to run if (1) I skip the DE analysis and generate a reference-guide transcript and (2) run the DE analysis separately using a precomputed transcriptome. So the issue then appears to be using the reference-guided transcriptome in the DE analysis. I would really appreciate some guiding getting this to work. There are not great S. cerevisiae reference transcriptomes, and so I would love to use the one I generate via the workflow (or am I not understanding correctly how the pipeline works?). The underlying goal of this analysis is to (1) compare the transcriptome I observe in these samples to existing and (2) generate read counts for each isoform and mRNA to perform DE analysis (either via the workflow here or using DeSEQ2 on my own in R).

cea295933 commented 7 months ago

I think this is resolved: I was supplying a reference transcriptome while asking to run the the reference-guided version ... removing the reference transcriptome seems to resolve this issue (though I am now encountering another). But I will post a separate issue for that

cjw85 commented 7 months ago

Hi @cea295933,

Please do open a new issue.

cea295933 commented 7 months ago

Just did … (#45) … thanks!

Colin Echeverría Aitken

Assistant Professor Biology Department Biochemistry Program Vassar College @. @.> 845.437.7430

On Nov 15, 2023, at 12:10 PM, Chris Wright @.***> wrote:

Hi @cea295933 https://github.com/cea295933,

Please do open a new issue.

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/43#issuecomment-1812934340, or unsubscribe https://github.com/notifications/unsubscribe-auth/BD2SWUZIX2G2C35WTNNP33TYETZRDAVCNFSM6AAAAAA7GS24NSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJSHEZTIMZUGA. You are receiving this because you were mentioned.