epi2me-labs / wf-single-cell

Other
73 stars 39 forks source link

`pipeline:process_bams:stringtie (1)` terminated with an error exit status (1) #125

Closed sme229 closed 1 month ago

sme229 commented 3 months ago

Operating System

Other Linux (please specify below)

Other Linux

Linux 5.14.21-150400.24.28-default

Workflow Version

wf-single-cell v2.1.0-ga25ec6c

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-single-cell --expected_cells 3000 -profile singularity --fastq 'Elena-cDNA-2/20240624_1312_1D_PAW58377_76308a2c/fastq_pass' --kit_name '3prime' --kit_version 'v3' --ref_genome_dir 'OryCun_genome'

Workflow Execution - CLI Execution Profile

singularity

What happened?

I get this error after ~5 h of execution: Caused by: Process pipeline:process_bams:stringtie (1) terminated with an error exit status (1)

Command executed: Add chromosome label (-l) to generated transcripts so we don't get name collisions during file merge later samtools view -h align.bam NC_001913.1 | tee >( stringtie -L -c 2 -p 8 -G chr.gtf -l "NC_001913.1.stringtie" -o "stringtie.gff" - ) | samtools fastq | bgzip --threads 2 -c > reads.fastq.gz Get transcriptome sequence gffread -g ref_genome.fa -w "transcriptome.fa" "stringtie.gff"

Command exit status: 1 Command output: (empty) Command error: [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 333717 reads GffObj::getSpliced() error: improper genomic coordinate 1026 on NC_001913.1 for NC_001913.1.stringtie.1.1

I generated the reference and gtf file with 10x as suggested. I then renamed the headers in the reference file to keep only the IDs with no description. I'm not sure why it says 'improper genomic coordinate 1026 on NC_001913.1 for NC_001913.1.stringtie.1.1' and how could I fix this?

Relevant log output

Jul-22 14:36:04.390 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 134; name: pipeline:process_bams:stringtie (1); status: COMPLETED; exit: 1; error: -; workDir: /datasets/work/hb-rabbit-gbc/work/Maria_Jenkel_ONT_Data_04-07-2024/work/f2/e1eeb8d9fe3e7f883fe255830c3c27]
Jul-22 14:36:04.393 [TaskFinalizer-10] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=pipeline:process_bams:stringtie (1); work-dir=/datasets/work/hb-rabbit-gbc/work/Maria_Jenkel_ONT_Data_04-07-2024/work/f2/e1eeb8d9fe3e7f883fe255830c3c27
  error [nextflow.exception.ProcessFailedException]: Process `pipeline:process_bams:stringtie (1)` terminated with an error exit status (1)
Jul-22 14:36:04.411 [TaskFinalizer-10] ERROR nextflow.processor.TaskProcessor - Error executing process > 'pipeline:process_bams:stringtie (1)'

Caused by:
  Process `pipeline:process_bams:stringtie (1)` terminated with an error exit status (1)

Command executed:

  # Add chromosome label (-l) to generated transcripts
  # so we don't get name collisions during file merge later
  samtools view -h align.bam NC_001913.1          | tee >(
          stringtie -L -c 2 -p 8                 -G chr.gtf -l "NC_001913.1.stringtie" -o "stringtie.gff" - )         | samtools fastq         | bgzip --threads 2 -c > reads.fastq.gz
  # Get transcriptome sequence
  gffread -g ref_genome.fa -w "transcriptome.fa" "stringtie.gff"

Command exit status:
  1

Command output:
  (empty)

Command error:
  [M::bam2fq_mainloop] discarded 0 singletons
  [M::bam2fq_mainloop] processed 333717 reads
  GffObj::getSpliced() error: improper genomic coordinate 1026 on NC_001913.1 for NC_001913.1.stringtie.1.1

Work dir:
  /datasets/work/hb-rabbit-gbc/work/Maria_Jenkel_ONT_Data_04-07-2024/work/f2/e1eeb8d9fe3e7f883fe255830c3c27

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Jul-22 14:36:04.412 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Jul-22 14:36:04.412 [Task submitter] INFO  nextflow.Session - [db/4937d4] Submitted process > pipeline:process_bams:stringtie (15)
Jul-22 14:36:04.421 [TaskFinalizer-10] DEBUG nextflow.Session - Session aborted -- Cause: Process `pipeline:process_bams:stringtie (1)` terminated with an error exit status (1)
Jul-22 14:36:04.438 [TaskFinalizer-10] DEBUG nextflow.Session - The following nodes are still active:
[process] pipeline:process_bams:align_to_transcriptome
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] pipeline:process_bams:assign_features
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] pipeline:process_bams:create_matrix
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] pipeline:process_bams:process_matrix
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] pipeline:process_bams:merge_transcriptome
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] pipeline:process_bams:combine_final_tag_files
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] pipeline:process_bams:tag_bam
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] pipeline:process_bams:umi_gene_saturation
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] pipeline:process_bams:pack_images
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] pipeline:prepare_report_data
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] pipeline:makeReport
  status=ACTIVE
  port 0: (value) bound ; channel: metadata
  port 1: (value) bound ; channel: versions
  port 2: (value) bound ; channel: params.csv
  port 3: (value) bound ; channel: stats
  port 4: (queue) OPEN  ; channel: survival.tsv
  port 5: (value) OPEN  ; channel: umap_dirs
  port 6: (value) OPEN  ; channel: images
  port 7: (value) bound ; channel: umap_genes
  port 8: (value) bound ; channel: wf_version
  port 9: (cntrl) -     ; channel: $

Jul-22 14:36:04.470 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop
Jul-22 14:36:05.640 [main] DEBUG nextflow.Session - Session await > all processes finished
Jul-22 14:36:05.640 [main] DEBUG nextflow.Session - Session await > all barriers passed
Jul-22 14:36:05.649 [Actor Thread 20] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=pipeline:process_bams:tag_bam; work-dir=null
  error [java.lang.InterruptedException]: java.lang.InterruptedException
Jul-22 14:36:05.649 [Actor Thread 33] DEBUG nextflow.file.SortFileCollector - FileCollector temp dir not removed: null
Jul-22 14:36:05.958 [main] WARN  n.processor.TaskPollingMonitor - Killing running tasks (1)
Jul-22 14:36:05.978 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=149; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=1685; submittedCount=1; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=1d 20h 31m 38s; failedDuration=31.4s; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=8; peakCpus=12; peakMemory=18 GB; ]
Jul-22 14:36:05.978 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow completed -- saving trace file
Jul-22 14:36:05.981 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
Jul-22 14:36:06.515 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline
Jul-22 14:36:06.620 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Jul-22 14:36:06.671 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

nrhorner commented 3 months ago

Hi @sme229 Is it possible that a coordinate in your reference gtf file refers to a location that does not exist in your reference sequences?

The size of the mitochondrial genome NC_001913.1 is 17,245 bp. Are there any end coordinates for NC_001913.1 in your GTF file that extend further than this?

sme229 commented 3 months ago

Hi @nrhorner Thanks for your response. No, there are no coordinates that extend beyond 17,245 in my gtf file:

image

sme229 commented 3 months ago

@nrhorner I removed the NC_001913.1 entry from the GTF file to see if that helps Update: same type of error gffread -g ref_genome.fa -w "transcriptome.fa" "stringtie.gff"

Command exit status: 1

Command output: (empty)

Command error: [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 3 reads GffObj::getSpliced() error: improper genomic coordinate 61536 on NW_026260071.1 for NW_026260071.1.stringtie.1.1

sme229 commented 3 months ago

Hi @nrhorner Let me provide more information please. When I get this error:

GffObj::getSpliced() error: improper genomic coordinate 37565 on NW_026260069.1 for NW_026260069.1.stringtie.1.1

I can see that there is a duplication in the resulting stringtie.gff file:

image

I'm not sure if that's how it should be? upd. I can see that they are not exactly the same, sorry.

nrhorner commented 2 months ago

HI @sme229

Sorry for the late reply. Would it be possible to share the work directory with me? /datasets/work/hb-rabbit-gbc/work/Maria_Jenkel_ONT_Data_04-07-2024/work/f2/e1eeb8d9fe3e7f883fe255830c3c27

I can send a link for you to drop that into.

sme229 commented 2 months ago

Hi @nrhorner

Sure, happy to do that. I have since re-run the pipeline but got the same error: Command error: [M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 1448 reads GffObj::getSpliced() error: improper genomic coordinate 37565 on NW_026260069.1 for NW_026260069.1.stringtie.1.1

Work dir: /datasets/work/hb-rabbit-gbc/work/Maria_Jenkel_ONT_Data_04-07-2024/work/ed/563d7d63a056ec563b717e48060582

So the directory has a different name.

nrhorner commented 2 months ago

Hi @sme229

I need an email address in order to grant access to a shared folder. Do you happen to have a linkedin profile or somewhere else I can message you to get your email address?

sme229 commented 2 months ago

Hi @nrhorner Sure, here is my email address Elena.Smertina@csiro.au