bailey-lab / MIPTools

A suite of computational tools used for molecular inversion probe design, data processing, and analysis.
https://miptools.readthedocs.io
MIT License
6 stars 9 forks source link

Arguments with spaces cause errors #26

Closed arisp99 closed 2 years ago

arisp99 commented 2 years ago

Bug Description

When arguments contain spaces, argparse is unable to parse the spaces and causes our code to crash. I encountered this error when attempting to run through the analysis test run. When running through the first section of the tutorial, the user is asked to specify the probe sets used before running the wrangler app. In the example, we write

probe_sets_used="DR1,VAR4"

However, if the user adds a space in between these two probe sets:

probe_sets_used="DR1, VAR4"

and tries to run the wrangler app using the following:

singularity run --app wrangler \
    -B $project_resources:/opt/project_resources \
    -B $fastq_dir:/opt/data \
    -B $wrangler_dir:/opt/analysis \
    $miptools -p $probe_sets_used -s $sample_sets_used -e $experiment_id \
    -l  $sample_list -c $cpu_number -m $min_capture_length

the app will fail with the following error: generate_wrangler_scripts.py: error: unrecognized arguments: VAR4.

Our code does aim to deal with these cases by cleaning up spaces in the following lines:

https://github.com/bailey-lab/MIPTools/blob/09f52eff8059661f04962e24715b3a1ad83d47cc/src/generate_wrangler_scripts.py#L79-L96

However, the code fails before we are able to reach this stage. It seems that the error is triggered by argparse here: https://github.com/bailey-lab/MIPTools/blob/09f52eff8059661f04962e24715b3a1ad83d47cc/src/generate_wrangler_scripts.py#L68

Suggested Implementation

Seeing as we are unable to deal with spaces within the python file, I suggest we deal with any spaces and misformatted arguments within the wrangler app before feeding in our arguments to the python script. We can do simple string manipulation in bash as outlined here. For instance, to remove whitespace:

probe_sets_used="DR1, VAR4"
echo bash ${probe_sets_used// /}
#> DR1,VAR4
aydemiro commented 2 years ago

yes we can change the app code to:

    # remove accidental spaces
    probe_sets=${probe_sets// /}
    sample_sets=${sample_sets// /}

    # Create wrangler bash scripts using python
    python /opt/src/generate_wrangler_scripts.py \
    -c ${cpu_count} -e ${experiment_id} ${keep_files} \
    -l /opt/analysis/${sample_list} -m ${min_capture_length} \
    -n ${server_number} -p ${probe_sets} -s ${sample_sets} \
    -w ${cluster_script} -x ${stitch_options}

Edit: probe_sets and sample_sets parameters are used in the app code instead of probe_sets_used.