Open landmanbester opened 2 weeks ago
A related question is how to UNSET a value from the CLI when it has a default but you want it evaluated to None. Is there a way to do that? One use case I have run into is when you want to use the FLAG column by default but ignore it some of the time (eg. spectral line imaging)
Would flag: Optional[str] = "FLAG"
, and --flag None
not work? Hmm, maybe it'll get parsed to a "None" string. It's a bit of a tricky problem in CLIs in general.
Yeah --flag None
gets parsed as a string
Well this issue is really in the domain of CLIs, as far as I'm concerned. So let's think of what makes sense from a CLI point of view. A --no-flag
option? With a corresponding policies field of none_negates_option
to tell Stimela to use this when the option is a None? That said, I'm not even sure this is something that click can express easily in the first place...
If this is not a common use case, it might be simpler to explicitly check for a ""
value for the parameter inside the Python code, and treat ""
as no flag column. Then you have a clean way to disable the flag column from both YaML and the CLI, no changes to the Stimela codebase necessary.
It is not that common and I have indeed gone with the latter option but that only works for string inputs. I was just wondering is there is a cleaner way to do this eg. something like --parameter None
or --parameter UNSET
where the string conversion gets handled in clickify_parameters. But this is a bit orthogonal to the current issue.
I think it makes sense for clickify_parameters to evaluate unset parameters to None
. Surely that is the intended behaviour for parameters defined without defaults? I also noted that, and this may just be something I am doing wrong, even with the Optional[dtype]
definition in my yaml files I still end up with the _UNSET_DEFAULT
values getting parsed. Am I doing something wrong here? Here is sample of the defaults that get parsed for a worker if I remove the last few lines handling the conversion
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
I
<UNSET DEFAULT VALUE>
True
True
<UNSET DEFAULT VALUE>
1
<UNSET DEFAULT VALUE>
True
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
False
<UNSET DEFAULT VALUE>
DATA
<UNSET DEFAULT VALUE>
<UNSET DEFAULT VALUE>
FLAG
<UNSET DEFAULT VALUE>
-1
32
double
1.0
3.0
<UNSET DEFAULT VALUE>
1
<UNSET DEFAULT VALUE>
True
False
I can get this working like so but I am not sure if it will have unintended consequences elsewhere. Also not sure if this makes sense for positional arguments. What do you think @o-smirnov ?
Random question, but does click let you do something like --parameter1 --parameter2 value
where parameter1 should take a value? I think that that currently results in a None in QC.
Nope. I get eg. Error: Option '--flag-column' requires an argument
. Not sure if it works for positional arguments but not for Options
In principle, parameters without a specified default are supposed to be caught here: https://github.com/caracal-pipeline/stimela/blob/master/scabha/schema_utils.py#L276. The idea is that the click option is then created without a default
keyword, so when a CLI option is omitted and click invokes the Python function, that option is not passed to the function call at all. So I don't understand how "in
should be a string comparison rather?
I think we should just hash out the desired logic (maybe on Tuesday) and write it down clearly so we can test against it. Here I'll just list the subtle aspects of the problem:
Although Python code often employs None
in the sense of "missing", this is just a convention and nowhere near universal. So I did not want to use None
to mark missing parameters within Stimela, as that would create difficulties with tools that do not employ this convention. Hence the special UNSET
value (which ultimately results in the unset parameter being removed from the final dict that gets passed to the cab).
When calling a Python function, there is a difference between "pass an explicit None value" and "pass nothing and rely on the function definition to provide its own default". When @landmanbester catches everything in the pfb workers via **kw
(which I find a bit naughty, as it means the true function signature is obscured), this becomes the difference between putting a key=None in kw
, or not putting the key into kw
.
I attempted to address this via the pass_missing_as_none
policy -- I'm open to arguments about whether this policy should be false or true by default, I never had a clear preference.
At CLI level, this difference manifests itself in whether @click.option
is created with or without a default
argument.
Likewise for defaults in cab definitions -- I wanted to maintain a clear distinction between default: None
and no default at all. Hence the internal UNSET_DEFAULT gymnastics.
As a final wrinkle, when we write a cab definition for a Python function, we can potentially define a default value that could be different from the default value defined in the Pythoin signature. This could be used for both good and evil...
You'll notice there is also a suppress_cli_default
policy. This is meant to address the wrinkle above. What it means is that the cab-definition defaults are in effect when calling the tool from stimela, but when a CLI is constructed, the defaults are not passed in, causing click to rely on the function signature defaults.
In principle, parameters without a specified default are supposed to be caught here: https://github.com/caracal-pipeline/stimela/blob/master/scabha/schema_utils.py#L276. The idea is that the click option is then created without a default keyword, so when a CLI option is omitted and click invokes the Python function, that option is not passed to the function call at all. So I don't understand how "" comes through. Maybe the in should be a string comparison rather?
""
doesn't come through. I have to catch it like this
Did you see what I did here?
Is that what you meant? I now just have this policy in the the config.yaml eg.
When calling a Python function, there is a difference between "pass an explicit None value" and "pass nothing and rely on the function definition to provide its own default". When @landmanbester catches everything in the pfb workers via **kw (which I find a bit naughty, as it means the true function signature is obscured), this becomes the difference between putting a key=None in kw, or not putting the key into kw.
This was an attempt at defining parameters in a single place. I agree that it's not best practice but the alternative is a lot more work and has proven to be more error prone given the number of rapidly evolving workers I am attempting to maintain. There is an obvious downside to this viz. the function doc string is not informative and has to be viewed by running the CLI command with --help
. This is an issue I need to address, open to suggestions. But I do want to avoid defining the function signature in multiple places.
Did you see what I did here?
Ah yes. I think this is the right thing to do!
This was an attempt at defining parameters in a single place. I agree that it's not best practice but the alternative is a lot more work and has proven to be more error prone given the number of rapidly evolving workers I am attempting to maintain. There is an obvious downside to this viz. the function doc string is not informative and has to be viewed by running the CLI command with
--help
. This is an issue I need to address, open to suggestions. But I do want to avoid defining the function signature in multiple places.
Yeah I hear you... it's a pain keeping the YaML and the Python signature in sync with this many parameters. Well, I suppose we can just proclaim the YaML cab definition to be part of the function documentation and leave it at that?
Yeah I hear you... it's a pain keeping the YaML and the Python signature in sync with this many parameters. Well, I suppose we can just proclaim the YaML cab definition to be part of the function documentation and leave it at that?
Yeah, good idea. Let me see if I can render that to the docstring without breaking something elsewhere. Thanks for the suggestion.
As yes. I think this is the right thing to do!
Actually, I think my logic was flawed there (I missed the not in
). I am running into something that confuses me though. In this line
the default policies will get overwritten by whatever is in schema.policies
correct? But at this stage schema.policies
refers to the per parameter policies so even if default_policies
has pass_missing_as_none=True
it will get overwritten by the policy for that parameter, which seems to be coming in as None
. Is this the intended behaviour? That would mean that the per parameter policy takes precedence even when it is None. Or am I missing something?
Good catch, I think this genuinely a bug! The corresponding cab invocation logic has something more sophisticated than a simple merge, see here: https://github.com/caracal-pipeline/stimela/blob/master/stimela/kitchen/cab.py#L168
I guess the solution is to use that same logic here.
I had a go in the clickify_missing_as_none
branch but no cigar. For some reason parameters with dtype List[URI]
do not get parsed properly, will keep digging
So the issue was that the pass_missing_as_none
policy wasn't propagated to the list and tuple validators. I added a check for this here
and similarly for _validate_tuple
. I think this now works as expected but need to test it more thoroughly. I've also merged master and profiling-hacks into this branch as I need them for my experiments
When defining inputs to be parsed by
clickify_parameters
such asthis will get parsed as an UNSET value. In pfb I get around this like so and I now realise you can also get around this by making parameters optional with no default (eg.
Optional[int]
above) but should it perhaps be the default behaviour?