datalad / datalad-catalog

Create a user-friendly data catalog from structured metadata
https://datalad-catalog.netlify.app
MIT License
15 stars 12 forks source link

Move from subcommands to an extended command suite #245

Closed jsheunis closed 1 year ago

jsheunis commented 1 year ago

Currently, catalog is the only entry point in the command suite, while the argument that immediately follows is interpreted as the subcommand, such as create, add, serve, etc.

Would it make sense to rather move towards extending the number of entry points in the command suite, and then do away with the subcommands?

Reasons for:

Reasons against:

mih commented 1 year ago

As you stated: this is what other extensions (and datalad proper) are doing, for the reasons you gave. I see no reason why catalog should be different.

That being said, there is a long-standing desire for being able to do non-cumbersome subcommands https://github.com/datalad/datalad/issues/4539 But so far nobody considered it important enough to actually do it. For me personally the resulting nightmare of backward compatibility crutches makes this a rather unattractive direction to think about.

jsheunis commented 1 year ago

This effort would be a good opportunity to introduce Constraints from datalad-next

jsheunis commented 1 year ago

Current thinking of new commands + required/optional params + constraints:

datalad catalog-create:

Create a new catalog with or without metadata, and with or without a specified configuration file (defaults if not provided)

datalad catalog-add:

Add metadata to an existing catalog, with or without a specified dataset-level configuration file (defaults to the catalog-level config if not provided)

datalad catalog-validate:

Validate metadata against the catalog schema. The schema version is determined from the catalog, if provided, otherwise from the latest supported version of the package installation.

datalad catalog-translate:

Translate a file with metalad-extracted metadata items from particular extractor structures into the catalog schema. The to-be-translated-to schema version is determined from the catalog, if provided, otherwise from the latest supported version of the package installation. Available translators (registered as entrypoints) will be filtered based on own criteria (extractor) to find the appropriate one.

datalad catalog-serve:

Start a local http server for viewing/testing a local catalog.

datalad catalog-remove:

Remove metadata of a specific dataset ID and VERSION from an existing catalog:

datalad catalog-workflow:

Run a workflow of metadata extraction, translation, and catalog (entry) generation, given a DataLad dataset hierarchy and a specified subcommand:

datalad catalog-set:

Utility for setting various properties of a catalog, based on the specified subcommand. Used to set the catalog home page, or to reset config at catalog- or dataset-level.

datalad catalog-get:

Utility for getting various properties of a catalog, based on the specified subcommand.

christian-monch commented 1 year ago

I agree with all reasons for the introduction of an extended command suite. I also think that verbosity is not really a problem. Most invocations will probably be scripted anyway.

The command suite makes sense to me. I have only one comment on -y, --config-file. The text states it must be a JSON-object. In the API you would probably expect a dict (although other types would be valid JSON-objects), on the command line a YAML or JSON file.

christian-monch commented 1 year ago

This effort would be a good opportunity to introduce Constraints from datalad-next

I agree.

jsheunis commented 1 year ago

thanks @christian-monch!

bpoldrack commented 1 year ago

Agree with everything so far. Only minor comment:

-y, --config-file is a bit strange. Where's the y coming from? How about a "standard" file argument -F, --config-file (capitalized to not read as conventional "force")?

jsheunis commented 1 year ago

Thanks for catching, I had gotten used to it. I think it's some legacy thing where the file was originally a YAML file, hence y. But F feels like a good option.