epi2me-labs / wf-somatic-variation

Other
15 stars 8 forks source link

Problem with data containing reads basecalled with more than one basecaller model #25

Closed alexcoppe closed 5 months ago

alexcoppe commented 5 months ago

Operating System

Other Linux (please specify below)

Other Linux

Red Hat Enterprise Linux release 8.6

Workflow Version

v.1.2.1

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

/hpcshare/genomics/ASL_ONC/NextFlow_RunningDir/nextflow-23.10.0-all run epi2me-labs/wf-somatic-variation -profile singularity -resume -process.executor 'pbspro' -process.memory 256.GB -work-dir '/archive/s2/genomics/onco_nanopore/test/test' -with-timeline --snv --sv --sample_name 'OHU0002HI' --bam_normal '/archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002HTNDN/OHU0002HTNDN.bam' --bam_tumor '/archive/s2/genomics/onco_nanopore/HUM_OHU_OHU0002ITTDN/OHU0002ITTDN.bam' --ref '/archive/s1/sconsRequirements/databases/reference/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta' --out_dir '/archive/s2/genomics/onco_nanopore/test' --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_sup@v4.2.0' --phase_normal --classify_insert --force_strand --normal_min_coverage 0 --tumor_min_coverage 0 --haplotype_filter_threads 32 --severus_threads 32 --dss_threads 4 --modkit_threads 32 -process.cpus 32 --mod

Workflow Execution - CLI Execution Profile

singularity

What happened?

Stopped working when reading the BAM generated by Dorado because reads basecalled with more than one basecaller model

Relevant log output

ERROR ~ Error executing process > 'ingress_tumor:checkBamHeaders (1)'

Caused by:
  Process `ingress_tumor:checkBamHeaders (1)` terminated with an error exit status (65)

Command executed:

  workflow-glue check_bam_headers_in_dir input_dir > env.vars
  source env.vars
  DS_RUNIDS=$(workflow-glue get_ds_records --xam input_dir --key runid --cardinality zero-or-more --sep ',')
  DS_BASECALL_MODELS=$(workflow-glue get_ds_records --xam input_dir --key basecall_model --cardinality zero-or-one --sep ',' --explode_obviously)

Command exit status:
  65

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  [15:43:52 - workflow_glue] Bootstrapping CLI.
  [15:43:52 - workflow_glue] Starting entrypoint.
  [15:43:52 - workflow_glue.checkBamHd] Checked (u)BAM headers in 'input_dir'.
  [15:43:52 - workflow_glue] Bootstrapping CLI.
  [15:43:52 - workflow_glue] Starting entrypoint.
  [15:43:53 - workflow_glue] Bootstrapping CLI.
  [15:43:53 - workflow_glue] Starting entrypoint.  

################################################################################
  # INPUT DATA PROBLEM
  Your input data contains reads basecalled with more than one basecaller model.

  Our workflows automatically select appropriate configuration and models for
  downstream tools for a given basecaller model. This cannot be done reliably when
  reads with different basecaller models are mixed in the same data set.

  ## Next steps
  To use this workflow you must separate your input files, making sure all reads
  are have been basecalled with the same basecaller model.
  ################################################################################

Work dir:
  /archive/s2/genomics/onco_nanopore/damntest/test/dc/b8e93e32a7578b44db5bab6c2fe318

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

other (please describe below)

Other demo data information

No response

cjw85 commented 5 months ago

You will need to basecall your data with a consistent basecaller model. The variant calling models are unique to each basecaller, as such it is not possible to perform variant calling with a mixture of data from different basecallers.

alexcoppe commented 5 months ago

Ok, but in the previous version, it worked with more than one variant calling model. I'll try using just one! Thank you.

cjw85 commented 5 months ago

Picked up again in https://github.com/epi2me-labs/wf-somatic-variation/issues/26