Closed cea295933 closed 7 months ago
circling back ... I can get this to run but it crashes during the DE analysis. I receive an error saying the Salmon needs to be upgraded:
ERROR ~ Error executing process > 'pipeline:differential_expression:count_transcripts (1)'
Caused by:
Process pipeline:differential_expression:count_transcripts (1)
terminated with an error exit status (1)
Command executed:
salmon quant --noErrorModel -p "4" -t "ammended.ref_transcriptome" -l SF -a "WTpoly1_reads_aln_sorted.bam" -o counts mv counts/quant.sf "WTpoly1.transcript_counts.tsv" seqkit bam "WTpoly1_reads_aln_sorted.bam" 2> "WTpoly1.seqkit.stats"
Command exit status: 1
Command output: (empty)
Command error: Version Info: ### PLEASE UPGRADE SALMON ###
The newest version, available at https://github.com/COMBINE-lab/salmon/releases contains new features, improvements, and bug fixes; please upgrade at your earliest convenience.
Sign up for the salmon mailing list to hear about new versions, features and updates at: https://oceangenomics.com/subscribe
Logs will be written to counts/logs [2023-11-10 20:38:24.291] [jointLog] [info] setting maxHashResizeThreads to 4 [2023-11-10 20:38:24.291] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored. Library format { type:single end, relative orientation:none, strandedness:sense } [2023-11-10 20:38:24.293] [jointLog] [info] numQuantThreads = 2 parseThreads = 2 Checking that provided alignment files have consistent headers . . . done Populating targets from aln = "WTpoly1_reads_aln_sorted.bam", fasta = "ammended.ref_transcriptome" . . .done
update: I can now get this to run if (1) I skip the DE analysis and generate a reference-guide transcript and (2) run the DE analysis separately using a precomputed transcriptome. So the issue then appears to be using the reference-guided transcriptome in the DE analysis. I would really appreciate some guiding getting this to work. There are not great S. cerevisiae reference transcriptomes, and so I would love to use the one I generate via the workflow (or am I not understanding correctly how the pipeline works?). The underlying goal of this analysis is to (1) compare the transcriptome I observe in these samples to existing and (2) generate read counts for each isoform and mRNA to perform DE analysis (either via the workflow here or using DeSEQ2 on my own in R).
I think this is resolved: I was supplying a reference transcriptome while asking to run the the reference-guided version ... removing the reference transcriptome seems to resolve this issue (though I am now encountering another). But I will post a separate issue for that
Hi @cea295933,
Please do open a new issue.
Just did … (#45) … thanks!
Colin Echeverría Aitken
Assistant Professor Biology Department Biochemistry Program Vassar College @. @.> 845.437.7430
On Nov 15, 2023, at 12:10 PM, Chris Wright @.***> wrote:
Hi @cea295933 https://github.com/cea295933,
Please do open a new issue.
— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-transcriptomes/issues/43#issuecomment-1812934340, or unsubscribe https://github.com/notifications/unsubscribe-auth/BD2SWUZIX2G2C35WTNNP33TYETZRDAVCNFSM6AAAAAA7GS24NSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJSHEZTIMZUGA. You are receiving this because you were mentioned.
Ask away!
Hi, I'm trying to run wf-transcriptomes on our HPC and have questions about the best way to take advantage of multiple cpus and multiple threads. I had been using the --ntasks-per-node option but have switched to the --cpues-per-task option. Does this make a meaningful difference and/or should I use them together? We also have multiple nodes. Would it be helpful to request more than one node? For reference, one node has 2 sockets, 32 CPUs per socket, and 2 threads per CPU core. That gives 128 CPUs per node. Can wf-transcriptomes take advantage of all of this, and if so, what is the best way to do so? I am attaching below two separate sbatch scripts. One requests 1 node and 64 cpus-per-task, whereas the other simply requests --exclusive and --mem=MaxMemPerNode
Thanks!
sbatch script one
!/bin/bash
SBATCH -J Aitken_epi2me_20231110_poly
SBATCH -o Aitken_epi2me_20231110_poly.out
SBATCH --nodes=1
SBATCH --cpus-per-task=64
SBATCH -p emc
cd /work/caitken/epi2me-labs
./nextflow run epi2me-labs/wf-transcriptomes -with-trace \ --fastq /work/caitken/data/DegronNanoporeSequencing/Poly \ --ref_genome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.fa \ --ref_annotation /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.gff \ --transcriptome_source reference-guided \ --ref_transcriptome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3_coding.fa \ --de_analysis \ --sample_sheet /work/caitken/data/DegronNanoporeSequencing/BarcodesPoly.csv \ --out_dir /work/caitken/data/DegronNanoporeSequencing/outputPoly \ -c /work/caitken/data/DegronNanoporeSequencing/my_config.cfg
sbatch script 2
!/bin/bash
SBATCH -J Aitken_epi2me_20231110_total
SBATCH -o Aitken_epi2me_20231110_total.out
SBATCH --nodes=1
SBATCH --exclusive
SBATCH --mem=MaxMemPerNode
SBATCH -p emc
cd /work/caitken/epi2me-labs
./nextflow run epi2me-labs/wf-transcriptomes -with-trace \ --fastq /work/caitken/data/DegronNanoporeSequencing/Total \ --ref_genome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.fa \ --ref_annotation /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3.gff \ --transcriptome_source reference-guided \ --ref_transcriptome /work/caitken/data/DegronNanoporeSequencing/sacCer3/20110902_sacCer3_coding.fa \ --de_analysis \ --sample_sheet /work/caitken/data/DegronNanoporeSequencing/BarcodesTotal.csv \ --out_dir /work/caitken/data/DegronNanoporeSequencing/outputTotal \ -c /work/caitken/data/DegronNanoporeSequencing/my_config.cfg