Open mih opened 5 years ago
Do you mean to make them datalad commands available through cmdline/python API, or what type of commands?
Yes. But I don't care if they become available through the main API. Simply using the same classes is what I care about.
That is what I was having in mind -- I actually do not want them to become available as part of the main API: would make it even slower and more bloated. But reusing more of existing standardization of input API specification is IMHO would be great (I always felt that way while expressing my confusion about new constructs as plugins, procedures etc doing a similar thing but not re-using existing machinery), and extending it with expected output spec (json schemas/validator?) and mix-in in such extractor's API class which they should implement, sounds like a good idea.
I wondered also if right away we could marry our Interface specification with the config. I guess this desire to turn them into commands could have been triggered by desire to make them parametric, e.g. "treat or not derivatives for BIDS dataset", "custom fields to exclude/include while processing DICOMS" etc -- i.e. all that we now hard code. In such cases it makes sense to be able to specify all those not only via cmdline but to be able to prescribe them in the config, since typically those are to persist per dataset.
With that in mind, although possibly somewhat only tangentially related, I think we should also enhance our "API builder" to automate interfacing sub-commands. E.g. similarish to click's groups: https://click.palletsprojects.com/en/7.x/commands/ and somewhat mimicing current "datalad siblings [-s|--name NAME] [ACTION]" behavior.
Here it could be datalad metadata-extractors
which would list available and/or enabled extractors if no specific one specified. Actions could be similar to siblings -- "query" (default), "run", and even conveniences like "enable", "disable".
If we had such groupping available, then it could be extended even to the extensions so there is "datalad containers [run/..]." (with list
being pretty much the default action "query")
With that in mind, although possibly somewhat only tangentially related, I think we should also enhance our "API builder" to automate interfacing sub-commands
GitMate.io thinks a possibly related issue is datalad/datalad#2729 (API: composite command).
Cool, so we all agree!
Plan:
With datalad/datalad#3134 there is now a single API command extract_metadata
that takes care of any extraction-related functionality (in contrast to aggregation, and access of aggregated metadata). I am now exploring whether it would be sensible and/or useful to make individual extractors commands too.
Hi @mih. Is that something already supported by datalad-metalad and we should just retag (or just refile and close) into datalad-metalad.
Use case: https://github.com/datalad/datalad-metalad/issues/55 which would need an R stack with many dependencies to extract metadata, so containers come to mind!
I started RF'ing the metadata code base. I increasingly dislike the special status of the extractors. They are essentially generators that yield JSON-serializable records -- just like any other command. Why not make them regular commands?
We would just need to define a minimal API that any extractor has to be compliant with.