caracal-pipeline / stimela

Stimela 2.0
GNU General Public License v2.0
5 stars 4 forks source link

allow wildcard specifiers in _include sections? #289

Open o-smirnov opened 7 months ago

o-smirnov commented 7 months ago

Working with an experimental code version, I currently have the following config I'm including, so as to use native versions of the package for some specific cabs:

# use native breifast for now
cabs:
  breifast.stack-time-cube:
    backend:
      select: native

  breifast.zarr-to-fits:
    backend:
      select: native

  breifast.make-baseline-image:
    backend:
      select: native

  breifast.process-residual-cube:
    backend:
      select: native

  breifast.render:
    backend:
      select: native

It occurs to me that this would read much more pleasantly if it was just:

# use native breifast for now
cabs:
  breifast.*:
    backend:
      select: native

...which would require wildcard processing in the include-and-merge logic. Seems straightforward to implement though, and should not be confusing to the user. Thoughts @SpheMakh @JSKenyon @landmanbester?

JSKenyon commented 7 months ago

I don't mind it and I do think there is an argument for this kind of feature. I do wonder if perhaps the syntax is getting somewhat opaque but I don't have an immediate counter-offer.

o-smirnov commented 7 months ago

Well, wildcards are already used elsewhere (see e.g. steps.image-*), so I don't think it increases the opacity all that much...

JSKenyon commented 7 months ago

I assume this would support further modification e.g.

cabs:
  breifast.*:
    backend:
      select: native
  breifast.render:
    inputs:
      foo:
        dtype: bool

Does the current steps.image-* match everything or only the last occurrence? I have only used it for the latter.

o-smirnov commented 7 months ago

I assume this would support further modification e.g.

Yes, that's logical. The merge is already a loop over included content, so expansion of wildcards at this stage would not be difficult.

Does the current steps.image-* match everything or only the last occurrence? I have only used it for the latter.

The alphanumerically highest occurrence from what has already been defined (i.e. among all preceding steps). Which works out to "last occurrence" if you're systematic with your step naming. In the substitution context, only one value can be returned, so it can't match everything, by construction.

Perhaps this is a potentially confusing mixing of metaphors? We could introduce a steps.image-^ syntax as an alternative, and deprecate steps.image-* in a later release.

o-smirnov commented 7 months ago

P.S. Since everything is an ordered dict these days, I suppose the meaning of image-* could be changed to the last matching entry.

JSKenyon commented 6 months ago

Perhaps this is a potentially confusing mixing of metaphors? We could introduce a steps.image-^ syntax as an alternative, and deprecate steps.image-* in a later release.

I think that this may be the way to go, although it is possibly an irritating change from the perspective of a user.

o-smirnov commented 5 months ago

I assume this would support further modification e.g.

cabs:
  breifast.*:
    backend:
      select: native
  breifast.render:
    inputs:
      foo:
        dtype: bool

I pushed an initial implementation. An important subtlety is that while the above should work as expected, the following:

cabs:
  breifast.*:
    backend:
      select: native
  breifast.render:
    backend:
      select: singularity

will not. This is because wildcards are handled in post-processing, so their content will be merged in after the specific non-wildcarded sections. I'll have a think if this can be implemented more elegantly, won't open a PR for now.

o-smirnov commented 5 months ago

This is hard to implement without leaving in some nasty and confusing semantics (or without surgery on OmegaConf), because at time of inclusion the wildcard doesn't know what sections will be matched yet. It can be done post-merge, like on this branch, but the case illustrated in https://github.com/caracal-pipeline/stimela/issues/289#issuecomment-2171806736 will be a tripwire for the unwary user (even more nasty and unintuitive if the non-wildcarded tweaks happen in another include). So I'm minded to leave this as a do-not-implement for now.

I also realized that the original use case (https://github.com/caracal-pipeline/stimela/issues/289#issue-2277275655) can already be implemented by the developer in terms of the existing semantics like so:

lib:
  misc:
    breifast:
      # common options for breifast cab can set en masse here
      cabs:
        backend: native

cabs:
  breifast.stack-time-cube:
    _use: lib.misc.breifast.cabs  # invoke common cab settings
    info: stacks a series of 2D images into a zarr "cube dataset" with a time axis
    flavour: python
    command: breifast.cubes.stack_time_cube