DCMLab / dimcat

DIgital Musicology Corpus Analysis Toolkit
https://dimcat.readthedocs.io
GNU General Public License v3.0
11 stars 5 forks source link

Dimcat needs a `Config` object #10

Closed huguesdevimeux closed 10 months ago

huguesdevimeux commented 1 year ago

Following an exchange by e-mail with @johentsch :

Every DiMCAT object (analyzer, dataset, facet, etc.) can be created from (some type of, you tell me) Config object that meets the following requirements:

  1. easy to maintain in the sense that, in order to add a new option to the configured class and its children, you need to change only one line to the corresponding Config class
  2. Config options have type annotations that can be used to create control elements for a (web)app. For example, an analyzer's option whose values are defined as a particular enum would naturally be configured using a drop-down menu. Hence a bonus question when thinking about the problem: If we found a straightforward mechanism for defining Configs as jsonschema, it would give us a lot of expressive power for defining value ranges that could be used both for more specific app control elements (say "value between 0-12") and for validating configurations (but with the potential caveat of a reduced type vocabulary).
  3. Some straightforward code pattern to turn any Config into the respective object it describes, and to turn any object into the corresponding Config, guaranteed to re-create it (in the case of data objects this will boil down to piece IDs, applied filters, or both)
  4. Config objects are easy to store as and create from JSON
  5. Configs differentiate between required values and those that come with a default; potentially raising an error when created but under-specified (open question if this should be the case)
  6. Partially defined configs can be used for lookup. For example, a particular FeatureConfig (describing a certain type of musical information that might be available for some pieces in the dataset but not for others) has a whole bunch of options, but the user cares about only two of them and wants to analyze all pieces for which a specific value pair of these options is available. NB this might involve creating a Config without specifying some of the options (in the sense of "whatever option is available"). There needs to be an efficient mechanism in place for storing for each piece in a dataset which value ranges are available for each feature option, and for matching such a partial config (potentially with a non-binary result such as 0="available for free", "available via transformation", 2="available but unreliable", 3="available only if X is provided elsewhere").
  7. Configs need to be upward-compatible. E.g. if a feature has been pickled with its configuration but at a later DiMCAT version users can configure that same feature with more options, this older feature still needs to be discoverable .

Edit: Configs should be hashable.

huguesdevimeux commented 1 year ago

A few comments on that, but I'm not familiar (yet :) ) with the library, so it may not be as accurate. For the few things I know, here is my take :

For what you have described, I understand that Config would do exactly what class parameters are for. Can you clarify in what extent it would differ ?

Also, I think we are mixing up two things there, that can lead to a flawed design.

Config options have type annotations that can be used to create control elements for a (web)app.

I don't think this should be mixed up with the user public API, but rather encapsulated in a Mixin/Abstract class WebComponent (or other name). It would provide the necessary interface for the web app to get the menus/others inputs requests.

It's just a quite hot take. Let me know what you think.

johentsch commented 1 year ago

For what you have described, I understand that Config would do exactly what class parameters are for. Can you clarify in what extent it would differ ?

That's right, the goal of this mechanism is to enable creating objects from descriptors ("objects representing a JSON configuration"). Naturally, the described properties will be used as arguments upon creation of the described object and therefore there is a correspondence between the descriptor's properties with the described's properties. Hence the first requirement of straightforward maintainability.

but rather encapsulated in a Mixin/Abstract class WebComponent

This is not a planned feature within DiMCAT. What is meant is that configuration options' types (and value constraints) need to be discoverable for other tools, and one example would be a webapp that wants to create appropriate control elements from them.

johentsch commented 1 year ago

Thanks for the fruitful discussion, @huguesdevimeux. Here's a sketch (adapted from the minimal working example mwe.py) of the current tentative design, that we are going to try and improve using marshmallow. The current design stays with the initial idea that every object has a .config field, only that here it's not a simple dictionary but the respective Config object.

huguesdevimeux commented 1 year ago

I think that's a good design, but we will have to be careful - and that can actually be quite a big issue - to unexpected behaviour with mutable variables within Config. For example, if a same reference to a Config object is used multiple times in the code, if a mutable value is modified in the object, it would modify it for every object instantiated with the same reference Config object. It can lead to some annoying side effects. That's why I think config should be a private attribute, and accessed only with a getter that would return a deep copy each time. Performance-wise, it's not good, but the end-user is not supposed to use repeatedly. Or, we could make Config immutable, but that's ... not really possible with python.

Let me know what you think.

johentsch commented 1 year ago

That's why at one point I was using frozen dataclasses for configs but these come with other disadvantages. Apart from the fact that with dataclasses it is not straightforward to define partial configs, e.g. to find all configs that match two particular settings (and pass that info around).

Anyway, the idea would be for you to transform the mwe in a way that it works with marshmallow. I don't know how easy this transformation will be in this setup; it could be that development will be more straightforward using mock objects.

huguesdevimeux commented 1 year ago

I see, that's pretty neat !

e.g. to find all configs that match two particular settings (and pass that info around)

Wait, is that a planned feature as well ? That's very nice, but maybe it changes some things implementation-wise. Do you mind to elaborate ?

Anyway, the idea would be for you to transform the mwe in a way that it works with marshmallow. I don't know how easy this transformation will be in this setup; it could be that development will be more straightforward using mock objects.

I will do that asap - not this week, though.

johentsch commented 1 year ago

Wait, is that a planned feature as well ? That's very nice, but maybe it changes some things implementation-wise. Do you mind to elaborate ?

It's point 6. in the OP.