bokulich-lab / q2-assembly

QIIME 2 plugin for (meta)genome assembly.
BSD 3-Clause "New" or "Revised" License
4 stars 12 forks source link

specify default values in function definition #31

Closed gregcaporaso closed 1 year ago

gregcaporaso commented 1 year ago

At present, it looks like you tend to define default values implicitly, and it would be better to do this explicitly (i.e., when you define the function the action is mapped to).

I suspect that you are doing this so that the defaults are actually set by the underlying code if not overridden by the user, which makes sense intuitively but is not ideal for a few reasons. First, it could result in your help text becoming outdated and misleading (e.g., if you specify the default is 1, and the underlying code is changed so it's 16, the help text your user is referring to will be wrong). Next, and probably most importantly, if not specified explicitly the parameter values won't be stored in data provenance. And finally, if you define explicitly on function definition, the default value will autopopulate in the help text for your action.

As an example of how I recommend doing this, take a look at this snippet of the help text from dada2 denoise-single:

  --p-n-threads INTEGER  The number of threads to use for multithreaded
                         processing. If 0 is provided, all available cores
                         will be used.                            [default: 1]

That default value is specified here.

I recommend ultimately making this change across the whole code base.

misialq commented 1 year ago

Thanks for bringing that up, @gregcaporaso. Yeah, I was not sure how to go about this initially and your guess is correct: I felt like leaving those undefined and letting the tools themselves pick the default values. I was not so much concerned with the text becoming outdated as I thought that on a tool version update one would need to review the text anyways to see whether anything else changed (to keep the q2 help in-line with the underlying tool) - under this assumption I'd need to review something somewhere anyway... However, your provenance argument absolutely speaks to me much more - I'll update that in another PR. Thanks! 🙏

gregcaporaso commented 1 year ago

on a tool version update one would need to review the text anyways

That's definitely a reasonable assumption and good practice, but in my experience these things have a way of getting out of date. For example, if someone else takes over maintenance of the plugin in the future, they may not be fully aware of where all the changes need to be made.

nbokulich commented 1 year ago

how should we handle the presets param in megahit? This overrides a group of params to provide some generic settings. This sounds useful, but at the same time is bad news for provenance if you are passing params (and recording them in provenance) that are actually ignored. From the megahit docs:

 Presets parameters:

  --presets <str>  override a group of parameters;

    possible values:

    meta           '--min-count 2 --k-list 21,41,61,81,99'

          (generic metagenomes, default)

    meta-sensitive '--min-count 2 --k-list 21,31,41,51,61,71,81,91,99'

          (more sensitive but slower)

    meta-large     '--min-count 2 --k-list 27,37,47,57,67,77,87'      

          (large & complex metagenomes, like soil)

    bulk           '--min-count 3 --k-list 31,51,71,91,99 --no-mercy' 

          (experimental, standard bulk sequencing with >= 30x depth)

    single-cell '--min-count 3 --k-list 21,33,55,77,99,121 --merge_level 20,0.96'

          (experimental, single cell data)

The options I see:

  1. status quo: we leave it and set the default as None
  2. we remove this option altogether
  3. we keep it but raise an error when it overrides a parameter that is also explicitly set (which would be almost always)

Is there any way to edit provenance? So, e.g., if a preset is passed then the preset values get written into provenance instead of the param setting that was passed?

Maybe I am being overly alarmist because, after all, the preset would also be written to provenance so the effect is reproducible... but it just seems misleading.

EDIT: for now I am going with "option 3" in #43 , i.e., to warn whenever the presets parameter is used. But it would be interesting to know if we can overwrite provenance to write in the presets.

gregcaporaso commented 1 year ago

@nbokulich, I agree with the approach you're taking now, and that it would be good to better support this type of situation in the future.