datalad / datalad-gooey

A graphical user interface for DataLad (datalad.org)
https://docs.datalad.org/projects/gooey
Other
4 stars 6 forks source link

Autogenerate metadata entry form from json schema #329

Open jsheunis opened 1 year ago

jsheunis commented 1 year ago

To be used together with https://github.com/datalad/datalad-gooey/pull/319

Some options include:

Some comments after initial testing:

The existing packages that I've tested aren't yet that enticing. I'm trying to focus on options that are as lean as possible, not necessarily linked to a specific javascript framework, and fully client side. Many jsonschema implementations provided here: https://json-schema.org/implementations.html.

I've tested https://github.com/hblanko/json-schema-forms and http://www.alpacajs.org/, both seem to not be maintained actively.

Alpaca loads bootstrap, jquery, handlebars and alpaca from a CDN. And then a simple JS script is needed to instantiate the form from a schema object. Example code shown here: http://www.alpacajs.org/tutorial.html

Something to take into account is referenced schemas ($ref fields) and how they are resolved. These aren't dealt with seamlessly with the tested packages. In both cases I had to update data or reference urls or data locations in order for it to work correctly. And if there are referenced schemas, there could also be additional files to supply together with the top-level schema.

Other useful tools:

jsheunis commented 1 year ago

Alpaca form example, using the json schema definition of the dataset in datalad-catalog https://user-images.githubusercontent.com/10141237/195564080-9d85b54e-a113-47cb-8c5e-e3a2df3cba15.mov

jsheunis commented 1 year ago

Comments by @mih:


It would be great if any datalad-extension to could declare an entrypoint for a metadata specification that can be understood by something like the custom metadata extractor (https://github.com/datalad/datalad-metalad/blob/master/datalad_metalad/extractors/custom.py).

The basic concept would be this:

  1. an entrypoint is a specification that declared (A) a subject for a metadata record, (B) a schema for a metadata record, (C) a location to deposit a metadata record in a dataset.
  2. (A) can be a file or a dataset
  3. (C) uses the concept of the custom extractor with sidecar files, see datalad.metadata.custom-content-source and datalad.metadata.custom-dataset-source config variables

Any extension can declare any number of such specs.

In Gooey, right-clicking on any dataset/file would bring up a menu with all known specs as possibilities to enter or manipulate. The form generated from the spec can be populated with existing records from the known target locations, and saved (back) into them.

If done this way, we have established the connection to metadatalad immediately, without having to implemented a ton of extra extractors.

Re https://github.com/datalad/datalad-gooey/issues/323: the outcome of the operation sketched above could be serialized into YAML, as a configurable alternative to use the same machinery to cover this use case too.


jsheunis commented 1 year ago

My current inclination is to do an implementation with HTML, VueJS and standard JS. Mainly because the existing options are buggy and I'm familiar with Vue from the catalog work.

But it is also an option to allow multiple schema-to-form translators to be slotted in between the provided schemas and the rendering of a form.