epi2me-labs / wf-human-variation

Other
87 stars 41 forks source link

Which reference to use? #36

Closed biomobot closed 11 months ago

biomobot commented 1 year ago

What happened?

Thank you for updating the workflow. I am running the workflow epi2me-labs/wf-human-variation 1.4.0 on EPI2ME Labs. I have tried to use the UCSC human reference downloaded from: rsync -a -P rsync://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz Got an error.

I also tried to use the ensembl reference: Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

Got a different error.

Not sure what reference I should be using. Maybe you can point me to the correct one? Thank you in advance. Looking forward to hearing from you.

Operating System

ubuntu 20.04

Workflow Execution

EPI2ME Labs desktop application

Workflow Execution - EPI2ME Labs Versions

EPI2ME Labs V4.1.3

Workflow Execution - CLI Execution Profile

None

Workflow Version

1.4.0

Relevant log output

This is epi2me-labs/wf-human-variation v1.4.0.
--------------------------------------------------------------------------------
[f2/eb5058] Submitted process > sv:runReport:getParams
[39/e0ac32] Submitted process > sv:runReport:getVersions
[9e/0b93fe] Submitted process > snp:getParams
[18/eb71f9] Submitted process > str:getVersions
[17/c0f1a3] Submitted process > snp:getVersions
[2f/c6e68d] Submitted process > getVersions
[2e/feba6f] Submitted process > basecalling:wf_dorado:make_mmi
[8b/a97d8c] Submitted process > getParams
[52/84ed33] Submitted process > str:getParams
[6c/102c20] Submitted process > cram_cache (1)
[0a/0bd5a7] Submitted process > lookup_clair3_model (1)
[52/ca2aa9] Submitted process > basecalling:wf_dorado:dorado (1)
[09/110e02] Submitted process > basecalling:wf_dorado:dorado (2)
[bc/e6971d] Submitted process > getAllChromosomesBed (1)
[d1/74920e] Submitted process > publish_artifact (1)
[7e/3c13b3] Submitted process > publish_artifact (2)
[c8/daccf4] Submitted process > publish_artifact (3)
[d6/fdff7c] Submitted process > sv:runBenchmark:intersectBedWithTruthset (1)
Error executing process > 'sv:runBenchmark:intersectBedWithTruthset (1)'
Caused by:
  Process `sv:runBenchmark:intersectBedWithTruthset (1)` terminated with an error exit status (1)
Command executed:
  bedtools intersect         -a ${WFSV_EVAL_DATA_PATH}/benchmark.bed         -b allChromosomes.bed         > target_truthset.bed
  if [ ! -s target_truthset.bed ]
  then
      echo "No overlaps found between truth and target"
      echo "Chr names in your target or reference and truthset may differ"
      exit 1
  fi
Command exit status:
  1
Command output:
  No overlaps found between truth and target
  Chr names in your target or reference and truthset may differ
Command error:
  ***** WARNING: File /data/wf_human_sv_benchmark/NIST_SVs_Integration_v0.6/benchmark.bed has inconsistent naming convention for record:
  1 834131  843115

  ***** WARNING: File /data/wf_human_sv_benchmark/NIST_SVs_Integration_v0.6/benchmark.bed has inconsistent naming convention for record:
  1 834131  843115
Work dir:
  /home/mbio/epi2melabs/instances/wf-human-variation_7500048c-1ca0-403d-bacb-cf985335bf64/work/d6/fdff7ce467bcbfc5f2d1fc836cc048
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
WARN: Killing running tasks (2)

This is epi2me-labs/wf-human-variation v1.4.0.
--------------------------------------------------------------------------------
[ab/f993bd] Cached process > snp:getVersions
[b7/b32b78] Cached process > getParams
[79/50fcd7] Cached process > lookup_clair3_model (1)
[a8/7db979] Cached process > str:getParams
[7c/3b9996] Cached process > snp:getParams
[99/e39274] Cached process > sv:runReport:getParams
[c7/69110d] Cached process > getVersions
[33/661fd6] Submitted process > basecalling:wf_dorado:dorado (1)
[ab/9c7c94] Submitted process > str:getVersions
[f4/433e3a] Submitted process > cram_cache (1)
[b4/6e236f] Submitted process > index_ref_fai (1)
[8d/7a6921] Submitted process > basecalling:wf_dorado:make_mmi
[e4/4dde59] Submitted process > sv:runReport:getVersions
[2d/955e55] Submitted process > basecalling:wf_dorado:dorado (2)
[cf/40ab4d] Submitted process > getAllChromosomesBed (1)
[8d/5a5e09] Submitted process > publish_artifact (1)
[4f/49712b] Submitted process > publish_artifact (2)
[58/e1f1f2] Submitted process > publish_artifact (3)
[d8/e279ac] Submitted process > sv:runBenchmark:intersectBedWithTruthset (1)
[94/bc0e3b] Submitted process > basecalling:wf_dorado:dorado_align (1)
[23/09e7b1] Submitted process > basecalling:wf_dorado:dorado_align (2)
[83/660ae6] Submitted process > basecalling:wf_dorado:merge_fail_calls
[57/a7aefc] Submitted process > basecalling:wf_dorado:merge_pass_calls
[69/c73e0b] Submitted process > publish_artifact (4)
[ea/044bca] Submitted process > publish_artifact (5)
[a5/b8f9d9] Submitted process > publish_artifact (6)
[67/8b0132] Submitted process > publish_artifact (7)
[e8/9d915f] Submitted process > readStats (1)
[5c/94f94f] Submitted process > getGenome
[bd/c20167] Submitted process > mosdepth (1)
[04/404e9d] Submitted process > configure_jbrowse (1)
Error executing process > 'getGenome'
Caused by:
  Process `getGenome` terminated with an error exit status (65)
Command executed:
  samtools idxstats 20.pass.cram > 20.pass.cram_genome.txt
  get_genome.py --chr_counts 20.pass.cram_genome.txt -o output.txt -w str
  genome_build=`cat output.txt`
Command exit status:
  65
Command output:
  (empty)
Command error:
  The genome build detected in the BAM is not compatible with this workflow.
Work dir:
  /home/mbio/epi2melabs/instances/wf-human-variation_3d74a8fa-c7f3-427a-8bb1-df6b8a19f200/work/5c/94f94f504e0b60c0584831c9690ec1
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
WARN: Killing running tasks (3)
cjw85 commented 1 year ago

We recommend using:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
biomobot commented 1 year ago

Thanks @cjw85 Unfortunately, still getting the same error using the reference you recommended.

N E X T F L O W ~ version 22.04.5 Launching /home/mbio/epi2melabs/workflows/epi2me-labs/wf-human-variation/main.nf [affectionatekepler] DSL2 - revision: 8546042309 |||||||||| ____ _ __ _ |||||||||| | _| _ | | \/ | __| | | | | ||||| | | | |) | | ) | |\/| | _| ___| |/ ` | ' \/ | ||||| | |_| /| | / /| | | | |_|| | (| | |) _ \ |||||||||| |____|_| |_|___|| ||| ||\,|._/|/ |||||||||| wf-human-variation v1.4.0

Core Nextflow options runName : affectionate_kepler containerEngine : docker container : ontresearch/wf-human-variation:shaa6d218582d6056ea970b73e61f138ebb0ce6c5b1 launchDir : /home/mbio/epi2melabs/instances/wf-human-variation_5fb641df-8bd2-4f43-bf27-fd5fd8beb852 workDir : /home/mbio/epi2melabs/instances/wf-human-variation_5fb641df-8bd2-4f43-bf27-fd5fd8beb852/work projectDir : /home/mbio/epi2melabs/workflows/epi2me-labs/wf-human-variation userName : mbio profile : standard configFiles : /home/mbio/epi2melabs/workflows/epi2me-labs/wf-human-variation/nextflow.config Workflow Options sv : true snp : true methyl : true str : true Input Options fast5_dir : /media/mbio/SATA_SSD/AP/C9_DNA/221107_NSC_FTD26_Enrich/no_sample/20221110_1819_MN36618_FAU71626_a0987114/fast5_all ref : /media/mbio/SATA_SSD/AP/C9_DNA/221107_NSC_FTD26_Enrich/221107_Analysis/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna basecaller_cfg : dna_r9.4.1_e8_sup@v3.3 Small variant calling options phase_vcf : true include_all_ctgs : true GVCF : true Modified base calling options phase_methyl : true Short tandem repeat expansion genotyping options sex : male Basecalling options remora_cfg : dna_r9.4.1_e8_sup@v3.4_5mCG@v0 basecaller_basemod_threads: 20 Advanced basecalling options qscore_filter : 8 basecaller_args : --batchsize 160 Output Options sample_name : 20 out_dir : /home/mbio/epi2melabs/instances/wf-human-variation_5fb641df-8bd2-4f43-bf27-fd5fd8beb852/output depth_intervals : true Structural variant benchmarking options sv_benchmark : true Multiprocessing Options threads : 20 ubam_sort_threads : 4 Other parameters process_label : wfdefault !! Only displaying parameters that differ from the pipeline defaults !!

If you use epi2me-labs/wf-human-variation for your analysis please cite:

SamStudio8 commented 1 year ago

The reference that @cjw85 has pointed you to (GCA_000001405.15_GRCh38_no_alt_analysis_set) is the reference we recommend for running the human variation pipeline. However, the SV benchmark data we use from NIST specifically requires alignments to human_g1k_v37 which can be downloaded via http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/.

Please note that the SV benchmarks will only be valid if you have sequenced HG002. If you do not need the benchmark information because you are not sequencing that genome, you can remove the option --sv_benchmark. If you do not need the benchmarking information you can use the recommended reference (GCA_000001405.15_GRCh38_no_alt_analysis_set) that was linked in the previous comment.