Open jainronit opened 1 month ago
Hi, can you successfully run the workflow with the demo data provided with singularity?
Hi, no the demo data does not work with singularity. I get the following error:
FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: while creating squashfs: create command failed
And can you normally run a programme such as hello world with singularity on your cluster?? It looks like it could be an issue with your singularity set up and you might want to contact your cluster administrator.
Hi @sarahjeeeze, I managed to successfully run the workflow with the demo data with Singularity. However, when running the workflow on my actual samples, I am running into an issue where there seems to be a problem with parsing a bam file.
The full output is attached here for your reference: slurm-50705236.txt
And here is the relevant error message:
ERROR ~ Error executing process > 'pipeline:differential_expression:map_transcriptome (2)'
Caused by:
Process pipeline:differential_expression:map_transcriptome (2)
terminated with an error exit status (1)
Command executed:
minimap2 -t 4 -ax splice -uf -p 1.0 "genome_index.mmi" "seqs.fastq.gz" | samtools view -Sb > "output.bam" samtools sort -@ 4 "output.bam" -o "PUS7KDrep2_reads_aln_sorted.bam"
Command exit status: 1
Command output: (empty)
Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred [WARNING] Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index. [M::main::27.5740.23] loaded/built the index for 130702 target sequence(s) [M::mm_mapopt_update::28.5940.26] mid_occ = 3254 [M::mm_idx_stat] kmer size: 14; skip: 10; is_hpc: 0; #seq: 130702 [M::mm_idx_stat::28.937*0.27] distinct minimizers: 35748550 (8.85% are singletons); average occurrences: 30.060; average spacing: 5.357; total length: 5756717400 [E::sam_hdr_create] Invalid header line: must start with @HD/@SQ/@RG/@PG/@CO [main_samview] fail to read the header from "-".
To help me recreate your error - where did you get the reference genome and annotation files from? Could you point me at the exact files? Also your reads are definitely --direct_rna not cdna?
I got the reference genome and annotation files from GENCODE (https://www.gencodegenes.org/human/release_46.html). Specifically, I am using the GRCh38.p14 genome assembly and the basic gene annotation gtf file. And yes, my reads are direct RNA, not cDNA.
Here is the exact command I am running:
nextflow run epi2me-labs/wf-transcriptomes \ --de_analysis \ --direct_rna \ --fastq 'differential_expression/reads/' \ --ref_annotation 'differential_expression/GRCh38.gtf' \ --minimum_mapping_quality 20 \ --ref_genome 'differential_expression/GRCh38.fa' \ --sample_sheet 'differential_expression/sample_sheet.csv' \ -profile singularity
hmm inputs seem fine - To rule out if being a memory error could you add this to the nextflow.config - at the end of the file
process {
withName: 'map_transcriptome' {
memory = 32.GB
}
}
Hi @sarahjeeeze, I am still getting the same error so it doesn't seem like it's a memory error? Could you let me know if there's any test I can run to see why samtools view seems to be having issues reading the header of the sam file? Is there a particular file that I should go look at to see if it might be malformed? I could also send you the input data to see if you can recreate the error on your end.
hi, to your input cmd can you try adding --minimap2_index_opts '-k 15'
, if that doesnt work please do share your data or a subsample with me as i am struggling to recreate your error with my reads and your input references
for ref_genome are you using Genome sequence (GRCh38.p14) - GRCh38.p14.genome.fa.gz
Hi Sarah, just to update you, I ended up resolving this issue by increasing the memory allocated to minimap2 to 48 GB in the nextflow.config file. Thank you so much for all your help, I really appreciate it!
Great thanks, we are working on some memory improvements for minimap steps.
Operating System
macOS
Other Linux
No response
Workflow Version
v23.04.3
Workflow Execution
Command line (Cluster)
Other workflow execution
No response
EPI2ME Version
No response
CLI command run
nextflow run epi2me-labs/wf-transcriptomes --de_analysis --direct_rna --fastq differential_expression/reads/ --minimum_mapping_quality 20 --ref_annotation differential_expression/GRCh38.gtf --ref_genome differential_expression/GRCh38.fa --sample_sheet differential_expression/sample_sheet.csv -profile singularity
Workflow Execution - CLI Execution Profile
singularity
What happened?
When trying to run the workflow through Singularity, I got an error where the image could not be fetched since there was no descriptor found for the reference (see log output below). I also looked for a log file output from Nextflow but it seems that there was none generated in the directory from which I ran the command. I'd appreciate any help troubleshooting!
Relevant log output
Application activity log entry
No response
Were you able to successfully run the latest version of the workflow with the demo data?
yes
Other demo data information
No response