bihealth / snappy-pipeline

SNAPPY Nucleic Acid Processing in Python
MIT License
8 stars 4 forks source link

Unclear error messages when GCNV model paths don't resolve #534

Open Nicolai-vKuegelgen opened 4 months ago

Nicolai-vKuegelgen commented 4 months ago

Describe the bug In order to run GCNV with the svcalling{targeted|wgs} steps, a precomputed model needs to be defined in the config. This model consists of a 'library' name (matching the ...) as well as file path pattern for contig_ploidy and model_pattern. If the file path patterns for (probably either) contig_ploidy or model_pattern do not resolve to any actual files, the snappy workflow doesw not run and the error message states that not model matching the library name could be found.

To Reproduce Steps to reproduce the behavior:

  1. Setup snappy sv_calling_wgs (or trageted) step for GCNV
  2. Have a model_pattern or contig_ploidy entry in the config that will not properly resolve
  3. Run snappy
  4. See the error, which is not helpful.

Expected behavior If a model is defined, but the file paths can not be resolved the error message should clearly state so.

Additional context I see 2 possible solutions for this: 1) The config validation models for GCNV could be adapted to check if the file path patterns can be resolved to actual files 2) The functions in the GCNV workflow need to throw an error when no files are found. (specifically: snappy_pipeline/workflows/common/gcnv/gcnv_run.py // get_model_dir_list should fail if ouput would be empty)

tedil commented 4 months ago

While 1 would be nice, the problem is that -- in general -- the files may not yet exist, because they may get produced by some other rule/step/workflow upstream. This is not the case for the GCNV models at the moment, but it would be nice to have a general solution introducing some type that allows us to distinguish between static paths ("these files must exist prior to running the workflow") and dynamic paths (URLs, SRA downloads, SODAR retrieval, simple strings for requesting upstream snakemake output). So as a short term fix, 2 is a good option I think, especially since the error can be very specific and helpful.