bailey-lab / MIPTools

A suite of computational tools used for molecular inversion probe design, data processing, and analysis.
https://miptools.readthedocs.io
MIT License
6 stars 9 forks source link

Wrangler: `--stitch-options` clarification #27

Closed arisp99 closed 2 years ago

arisp99 commented 2 years ago

The --stitch-options (-x) argument's help indicates that it defines the "probe set to be processed." I suspect that this is actually incorrect as there is an argument --probe-sets (-p).

Experimenting with the generate wrangler script seems to indicate that --stitch-options allows the user to specify additional arguments when extracting the sequences and stitching forward and reverse reads using MIPWrangler mipSetupAndExtractByArm. @aydemiro can you clarify what --stitch-options is used for?

Possible Bug

If --stitch-options is indeed used to pass in additional arguments to MIPWrangler mipSetupAndExtractByArm there may be a bug in the code. I would expect that we would pass additional arguments using a list of arguments. For example: "--overWriteDirs,--overWriteLog". However, if we attempt to pass that into the wrangler app,

stitch_options="--overWriteDirs,--overWriteLog"

singularity run --app wrangler \
    -B $project_resources:/opt/project_resources \
    -B $fastq_dir:/opt/data \
    -B $wrangler_dir:/opt/analysis \
    $miptools -p $probe_sets_used -s $sample_sets_used -e $experiment_id \
    -l  $sample_list -c $cpu_number -m $min_capture_length -x $stitch_options

The code crashes with the following error: gnerate_wrangler_scripts.py: error: argument -x/--stitch-options: expected one argument. This is caused by argparse and it is a known issue that argpasrse will not parse dashes correctly. To address this, we could ask users to input the name of the flags without the dashes and then within the python script add the dashes to the flags.

aydemiro commented 2 years ago

These are additional options to pass to the first step of MIPWrangler where it extract sequences and stitches paired end reads to single sequences. More specifically, it is any additional option to pass to mipSetupAndExtractByArm subcommand of MIPWrangler.

Since these arguments are themselves command line arguments starting with dashes, it must be specified in the format -x ',--long-option-name'

That is, with a leading comma and within single quotes.

aydemiro commented 2 years ago

Some of the options that can be passed with this argument (with their default values) are:

--stitchGapExtend=1
--stitchGapOpen=10
--stitchMatchScore=2
--stitchMismatchScore=-2

So for example if you wanted to change gap open and gap extend parameters of the stitch operation, you can pass -x ',--stitchGapExtend=1,--stitchGapOpen=10'

It is doubtful anyone will dare use these settings, though.

arisp99 commented 2 years ago

Thank you! I just tested this out and was able to feed in additional inputs. The key was the leading comma. Just for future reference, you can use either single or double-quotes.

I will work on making the documentation for this feature clearer and close the issue when that has been fixed!