core-unit-bioinformatics / reference-container

Build repository for reference container
MIT License
0 stars 0 forks source link

added information when base container is missing #4

Closed svenwillger closed 2 years ago

svenwillger commented 2 years ago

Please review changes

svenwillger commented 2 years ago

All the changes I made before to the functions.smkmodule and we discussed didn't solve the problem, that's why I'm not replying to the raised points in the change request. I went thoroughly through the workflow and the reason for the misleading error message ('No "metadata_" keys found in config. Did you forget to load a reference container config?') was that the functions.smk loops several times through the function "_get_container_metadata" and creates a "new" 'refcon_md' every time another function refers to "_get_container_metadata" and at some point it arrives with the name of the desired container (e.g. broad-hg38...) in "refcon_name", which triggers the first if condition "if refcon_name is not None" but during that loop in line 12 'select names' only expects "base" and "base_v0" resulting in an error that leaves 'refcon_md' empty and therefore triggers the error message and then stops the script.

I was able to find an easy solution to the whole issue!

The function "collect_base_images" starting in line 78 already deals with the case what to do when there is no file 'container/base_v0.sif' and displays the instructions what to do, BUT it was missing a stop sign in the function if that problem occurs.

I just added the last line to the section with the instructions:

info_msg = f'\nACTION REQUIRED --- missing base container detected: {base_image}.sif\n'
info_msg += f'Please run the following build command in the "container" folder:\n'
info_msg += f'$ cd {workflow_dir}/container\n'
info_msg += f'$ sudo singularity build {base_image}.sif {base_image}.def\n\n'
info_msg += '(Restart the pipeline afterwards to continue building the reference container)\n\n'
sys.stderr.write(info_msg)
raise ValueError('Container is missing. Please follow the instructions above!!')

This stops the script right there and you can see the instructions directly above the error message and includes now the line that the pipeline needs to be started again to build the actual reference container.

The same is true for the actual reference container image. Here the problem is in packaging.smkin the rule "notify user". The error that was originally thrown was a missing input and after 5 sec waiting it stops. I now also added the ValueError so that the script stops immediately and displays that the instructions what to do are just above the error message.

I tested the new workflow several times, with the broad and ncbi configs and with several possible cases (all correct input provided, no sif file(s), no "metadata_" in the yaml, no yaml) and the workflow behaves in all cases as expected.

I pushed the new commit to all and rebased feature onto dev.

The latest commit (hash ID 20751) was submitted with the message "fix when base container is missing" and should fix the issue. Minor changes we discussed, like adding the TODOregarding the parent folder and changing singularity: to container: will be submitted separately.

ptrebert commented 2 years ago

I have changed the funtion collect_base_images again to restore the desired intent: 1) note the comment: iterate through configs and collect all base images; throwing an error immediately could lead to the annoying behavior of reporting errors one by one after each restart 2) however, starting the build process for all available base images, and failing for the missing ones may also not be perfect, as base images should be quick to build and add. Consequently, the behavior of the pipeline is now changed to collect all errors, and stops if there is at least one missing base image.

ptrebert commented 2 years ago

Rebasing onto dev and merging done