bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

bcbio CWL generator fails with: ValueError: Did not find variable config__algorithm__fusion_caller in #3471

Closed TomGardner closed 2 years ago

TomGardner commented 3 years ago

Version info

To Reproduce Exact bcbio command you have used:

/usr/local/share/bcbio-nextgen/anaconda/envs/bcbiovm/bin/bcbio_vm.py cwl --systemconfig /bcbio-workdir/test-tomg/bcbio_system.yaml /bcbio-workdir/test-tomg/samples.yaml

Your sample configuration file:

details:
- algorithm:
    adapters:
    - truseq
    - polya
    aligner: hisat2
    expression_caller:
    - stringtie
    transcript_assembler:
    - stringtie
    trim_reads: read_through
  analysis: RNA-seq
  description: Parental_rep1
  files:
  - /bcbio-workdir/test-tomg/fastqs/SRR13192655_1_subsampled.fastq.gz
  - /bcbio-workdir/test-tomg/fastqs/SRR13192655_2_subsampled.fastq.gz
  genome_build: GRCh37
  metadata:
    category: Parental
- algorithm:
    adapters:
    - truseq
    - polya
    aligner: hisat2
    expression_caller:
    - stringtie
    transcript_assembler:
    - stringtie
    trim_reads: read_through
  analysis: RNA-seq
  description: Parental_rep2
  files:
  - /bcbio-workdir/test-tomg/fastqs/SRR13192656_1_subsampled.fastq.gz
  - /bcbio-workdir/test-tomg/fastqs/SRR13192656_2_subsampled.fastq.gz
  genome_build: GRCh37
  metadata:
    category: Parental
- algorithm:
    adapters:
    - truseq
    - polya
    aligner: hisat2
    expression_caller:
    - stringtie
    transcript_assembler:
    - stringtie
    trim_reads: read_through
  analysis: RNA-seq
  description: Parental_rep3
  files:
  - /bcbio-workdir/test-tomg/fastqs/SRR13192657_1_subsampled.fastq.gz
  - /bcbio-workdir/test-tomg/fastqs/SRR13192657_2_subsampled.fastq.gz
  genome_build: GRCh37
  metadata:
    category: Parental
- algorithm:
    adapters:
    - truseq
    - polya
    aligner: hisat2
    expression_caller:
    - stringtie
    transcript_assembler:
    - stringtie
    trim_reads: read_through
  analysis: RNA-seq
  description: BRG1_SMARCA4_KO_rep1
  files:
  - /bcbio-workdir/test-tomg/fastqs/SRR13192658_1_subsampled.fastq.gz
  - /bcbio-workdir/test-tomg/fastqs/SRR13192658_2_subsampled.fastq.gz
  genome_build: GRCh37
  metadata:
    category: BRG1_SMARCA4_KO
- algorithm:
    adapters:
    - truseq
    - polya
    aligner: hisat2
    expression_caller:
    - stringtie
    transcript_assembler:
    - stringtie
    trim_reads: read_through
  analysis: RNA-seq
  description: BRG1_SMARCA4_KO_rep2
  files:
  - /bcbio-workdir/test-tomg/fastqs/SRR13192659_1_subsampled.fastq.gz
  - /bcbio-workdir/test-tomg/fastqs/SRR13192659_2_subsampled.fastq.gz
  genome_build: GRCh37
  metadata:
    category: BRG1_SMARCA4_KO
- algorithm:
    adapters:
    - truseq
    - polya
    aligner: hisat2
    expression_caller:
    - stringtie
    transcript_assembler:
    - stringtie
    trim_reads: read_through
  analysis: RNA-seq
  description: BRG1_SMARCA4_KO_rep3
  files:
  - /bcbio-workdir/test-tomg/fastqs/SRR13192660_1_subsampled.fastq.gz
  - /bcbio-workdir/test-tomg/fastqs/SRR13192660_2_subsampled.fastq.gz
  genome_build: GRCh37
  metadata:
    category: BRG1_SMARCA4_KO
- algorithm:
    adapters:
    - truseq
    - polya
    aligner: hisat2
    expression_caller:
    - stringtie
    transcript_assembler:
    - stringtie
    trim_reads: read_through
  analysis: RNA-seq
  description: BRM_SMARCA2_KO_rep1
  files:
  - /bcbio-workdir/test-tomg/fastqs/SRR13192661_1_subsampled.fastq.gz
  - /bcbio-workdir/test-tomg/fastqs/SRR13192661_2_subsampled.fastq.gz
  genome_build: GRCh37
  metadata:
    category: BRM_SMARCA2_KO
- algorithm:
    adapters:
    - truseq
    - polya
    aligner: hisat2
    expression_caller:
    - stringtie
    transcript_assembler:
    - stringtie
    trim_reads: read_through
  analysis: RNA-seq
  description: BRM_SMARCA2_KO_rep2
  files:
  - /bcbio-workdir/test-tomg/fastqs/SRR13192662_1_subsampled.fastq.gz
  - /bcbio-workdir/test-tomg/fastqs/SRR13192662_2_subsampled.fastq.gz
  genome_build: GRCh37
  metadata:
    category: BRM_SMARCA2_KO
- algorithm:
    adapters:
    - truseq
    - polya
    aligner: hisat2
    expression_caller:
    - stringtie
    transcript_assembler:
    - stringtie
    trim_reads: read_through
  analysis: RNA-seq
  description: BRM_SMARCA2_KO_rep3
  files:
  - /bcbio-workdir/test-tomg/fastqs/SRR13192663_1_subsampled.fastq.gz
  - /bcbio-workdir/test-tomg/fastqs/SRR13192663_2_subsampled.fastq.gz
  genome_build: GRCh37
  metadata:
    category: BRM_SMARCA2_KO
fc_name: samples
upload:
  dir: /home/eichinger_felix/docker_run1_star

Observed behavior Error message or bcbio output:

[2021-04-05T20:14Z] INFO: Using input YAML configuration: /bcbio-workdir/test-tomg/samples.yaml
[2021-04-05T20:14Z] INFO: Checking sample YAML configuration: /bcbio-workdir/test-tomg/samples.yaml
Preparing CWL input tarball: /mnt/biodata/genomes/Hsapiens/GRCh37/star-wf.tar.gz
Traceback (most recent call last):
File "/usr/local/share/bcbio-nextgen/anaconda/envs/bcbiovm/bin/bcbio_vm.py", line 354, in <module>
args.func(args)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/cwl/main.py", line 12, in run
create.from_world(world, run_info_yaml, integrations=integrations, add_container_tag=args.add_container_tag)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/cwl/create.py", line 40, in from_world
prep_cwl(samples, workflow_fn, out_dir, out_file, integrations, add_container_tag=add_container_tag)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/cwl/create.py", line 430, in prep_cwl
for cur in workflow.generate(variables, steps, wfoutputs):
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/cwl/workflow.py", line 65, in generate
inputs, parallel_ids, nested_inputs = _get_step_inputs(step, file_vs, std_vs, parallel_ids)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/cwl/workflow.py", line 171, in _get_step_inputs
for orig_input in [_get_variable(x, file_vs) for x in _handle_special_inputs(step.inputs, file_vs)]:
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/cwl/workflow.py", line 171, in <listcomp>
for orig_input in [_get_variable(x, file_vs) for x in _handle_special_inputs(step.inputs, file_vs)]:
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python3.7/site-packages/bcbio/cwl/workflow.py", line 290, in _get_variable
raise ValueError("Did not find variable %s in \n%s" % (vid, pprint.pformat(variables)))
ValueError: Did not find variable config__algorithm__fusion_caller in
[{'id': 'analysis', 'type': 'string'},
{'id': 'config__algorithm__adapters',
'type': {'items': 'string', 'type': 'array'}},
{'id': 'config__algorithm__align_split_size', 'type': ['null', 'string']},
{'id': 'config__algorithm__aligner', 'type': 'string'},
{'id': 'config__algorithm__archive', 'type': ['null', 'string']},
{'id': 'config__algorithm__bam_clean', 'type': ['string', 'null', 'boolean']},
{'id': 'config__algorithm__coverage_interval', 'type': ['null', 'string']},
{'id': 'config__algorithm__effects', 'type': 'string'},
{'id': 'config__algorithm__ensemble', 'type': ['null', 'string']},
{'id': 'config__algorithm__exclude_regions',
'type': ['null', {'items': 'null', 'type': 'array'}]},
{'id': 'config__algorithm__expression_caller',
'type': {'items': 'File', 'type': 'array'}},
{'id': 'config__algorithm__mark_duplicates',
'type': ['string', 'null', 'boolean']},
{'id': 'config__algorithm__min_allele_fraction', 'type': 'double'},
{'id': 'config__algorithm__nomap_split_size', 'type': 'long'},
{'id': 'config__algorithm__nomap_split_targets', 'type': 'long'},
{'id': 'config__algorithm__qc', 'type': {'items': 'string', 'type': 'array'}},
{'id': 'config__algorithm__quality_format', 'type': 'string'},
{'id': 'config__algorithm__realign', 'type': ['string', 'null', 'boolean']},
{'id': 'config__algorithm__recalibrate',
'type': ['string', 'null', 'boolean']},

Expected behavior A clear and concise description of what you expected to happen. It to not fail. Log files Please attach (10MB max): bcbio-nextgen.log, bcbio-nextgen-commands.log, and bcbio-nextgen-debug.log.

Additional context Add any other context about the problem here.

naumenko-sa commented 3 years ago

Hi @TomGardner !

Thanks for reporting and sorry about the issues! Currently bcbio_vm and cwl functionality is not available, there are numerous issues. We are planning to put more work into this aspect of bcbio during this summer, see release planning https://github.com/bcbio/bcbio-nextgen/issues/3242

If you need that ASAP, take a look into https://github.com/nextflow-io/awesome-nextflow