datalad / datalad-catalog

Create a user-friendly data catalog from structured metadata
https://datalad-catalog.netlify.app
MIT License
15 stars 12 forks source link

Discrepancies in schema version between existing catalog and installed package #260

Closed jsheunis closed 3 months ago

jsheunis commented 1 year ago

If an existing catalog was created with a specific previous catalog schema version, then metadata to be added to the existing catalog will need to be validated against this specific catalog schema version, and not against the latest schema version supported by the package installation.

Currently, the to-be-added metadata is validated against the package-supported catalog schema. No check is done whether the catalog supports that same schema, or a previous or later schema.

Proposed handling:

This would require users to install a previous version of datalad-catalog if they want to generate records for a previously created catalog, which could be bad for UX (but maybe it is a standard way of dealing with such cases?). The alternative is to support catalog record generation for multiple versions of the catalog schema in a single version of the datalad-catalog package, which I don't like since it would require continuous maintenance/updates.

Comments anyone?

loj commented 1 year ago

I think your proposed solution makes sense. If a user want to maintain an old catalog schema, I don't think it's out of line to make them use an older version of datalad-catalog.

But what would they do if they want to update their catalog to use the latest schema version? Would they have to generate a new catalog or would there be a way for them to update the existing one?

jsheunis commented 1 year ago

Thanks @loj!

But what would they do if they want to update their catalog to use the latest schema version? Would they have to generate a new catalog or would there be a way for them to update the existing one?

Good question. There would have to be dedicated functionality for transforming a catalog between schema versions, which doesn't currently exist. This functionality would need to transform (probably) all catalog metadata files and would replace (probably) all assets, since e.g. the catalog html and JS would likely have changed alongside the schema version changes. This means basically replacing the whole catalog. So the question is whether there's a specific reason a transformation would be more preferable than a new catalog generation. I was uncertain before, but now I'm leaning towards just going for regeneration. But curious to hear if there might be circumstances where this is not a good option.

loj commented 1 year ago

So the question is whether there's a specific reason a transformation would be more preferable than a new catalog generation. I was uncertain before, but now I'm leaning towards just going for regeneration. But curious to hear if there might be circumstances where this is not a good option.

Nothing really comes to mind, but I haven't played with the catalog workflow much. Maybe others will have thoughts on this. But from what you describe, a new catalog generation does sound preferable.

jsheunis commented 1 year ago

Thinking of the distributed use case, the people in charge of maintaining the catalog might not have access to the original data anymore, or not to the metalad-extracted metadata. In such cases all they can operate on would be the content of the catalog, which points to the transformation functionality being useful. But this is hypothetical for now, we don't have concrete use cases atm.