bihealth / snappy-pipeline

SNAPPY Nucleic Acid Processing in Python
MIT License
8 stars 4 forks source link

CNV input conflict when two library kits in cohort #115

Open eudesbarbosa opened 2 years ago

eudesbarbosa commented 2 years ago

Issue The second round of execution of the workflow targeted_seq_cnv_calling leads to conflict of input for step targeted_seq_cnv_calling_gcnv_scatter_intervals.

To Reproduce Steps to reproduce the behavior:

  1. Execute targeted_seq_cnv_calling workflow for cohort with two library kits, example: both Agilent SureSelect Human v6 and v8.
  2. Add more samples to either one of the library kits.
  3. Execute targeted_seq_cnv_calling again.
  4. See error:
    
    gatk IntervalListTools \
    --INPUT .../targeted_seq_cnv_calling/work/bwa.gcnv_filter_intervals.Agilent_SureSelect_Human_All_Exon_V6/out/bwa.gcnv_filter_intervals.Agilent_SureSelect_Human_All_Exon_V6.interval_list .../targeted_seq_cnv_calling /work/bwa.gcnv_filter_intervals.Agilent_SureSelect_Human_All_Exon_V8 
    /out/bwa.gcnv_filter_intervals.Agilent_SureSelect_Human_All_Exon_V8.interval_list \
    --SUBDIVISION_MODE INTERVAL_COUNT  \
    --SCATTER_CONTENT 5000 \
    --OUTPUT work/bwa.gcnv_scatter_intervals.Agilent_SureSelect_Human_All_Exon_V6/out/bwa.gcnv_scatter_intervals.Agilent_SureSelect_Human_All_Exon_V6

... Invalid argument '...targeted_seq_cnv_calling/work/bwa.gcnv_filter_intervals.Agilent_SureSelect_Human_All_Exon_V8/out/bwa.gcnv_filter_intervals.Agilent_SureSelect_Human_All_Exon_V8.interval_list'.



**Expected behavior**
The input to step `targeted_seq_cnv_calling_gcnv_scatter_intervals` should be filtered based on `library_kit` wildcard.

**Additional context**
Issue with unfiltered call in Snakemake file: https://github.com/bihealth/snappy-pipeline/blob/master/snappy_pipeline/workflows/targeted_seq_cnv_calling/gcnv_cohort_mode.rules#L23
Workflow method:  https://github.com/bihealth/snappy-pipeline/blob/master/snappy_pipeline/workflows/targeted_seq_cnv_calling/__init__.py#L1366 
eudesbarbosa commented 2 years ago

Change of plans It is more elegant to simply removed the model subworkflow. It was anyway there just for some possible developments. There is still concern that there might be some silent error associated with problems in the config file. Specifically, that the correct library kit wildcard won't be use for a sample if the library is poorly defined in the config.