GetDKAN / dkan

DKAN Open Data Portal
https://dkan.readthedocs.io/en/latest/index.html
GNU General Public License v2.0
373 stars 171 forks source link

Move metadata to custom entity #3168

Open dafeder opened 4 years ago

dafeder commented 4 years ago

Metadata is currently stored in a single content type, data, and differentiated with a text field, "type". This was a convenient way to get the system up and running but presents some problems:

  1. Metadata shows up in the Drupal content administration screen, including child data objects like "distributions" and "publishers".
  2. All nodes receive a route in Drupal, so to keep all the component pieces of a single dataset published or unpublished according to their parent's status will require some kind of syncing function, which does not yet exist
  3. We don't currently have a straightforward way to manage the core schemas if you want to break from what we ship with
  4. On a purely conceptual level, we have essentially created a concept of "bundles" within data entities but implemented them in a nonstandard way, making the system less intuitive for Drupal developers

Proposal is that we refactor the Metastore module to implement a new entity.

Metadata storage architecture

While this will result in a truly flexible metadata architecture, we have some core concepts in the DKAN codebase that need to be represented somewhere. Schemas will need to have the capability to be classified as one of the following types:

Schemas not identified as one of these core types are simply available for storage with no special behaviors associated with them.

* Schemas that are designated for "manual creation" can be created by a user action -- clicking "create new [bundle name]" or posting to an endpoint. Schemas not available for manual creation are created by the DKAN referencing system and may only be created or modified through their parent entities.

Metadata admin UI

This proposal assumes that the change to Drupal Form API described in #3166 have been implemented.

The metadata edit form will need to distinguish between child entities that have a cardinality of 1 (such as publisher) and child entities that allow for an unlimited array of objects (primarily in the case of distribution). The parent schema and ui schema should be able provide enough context to determine this without adding additional properties to the schema bundle definition.

In the case of any schema field that allows for unlimited complex child objects, we can decide between inserting sub-forms with AJAX, or adding an "Add [schema bundle name]" button to datasets for a multistep/"wizard" experience.

Upgrade path

This would of course represent at least a new minor version and would require an easy upgrade path to migrate "Data" nodes into Metadata items with the proper schema bundles.

fmizzell commented 4 years ago

@dafeder Regarding issue 1, this is not a bug, it is the reason for the referencing system. If one does not expect metadata pieces to need to be managed as individual things (for example updating a publisher's info), then it should not be chosen as a referenceable object and it should never show up as its own entity. So effectively, the use case of having referenced entities that need to be hidden should not exist.

fmizzell commented 4 years ago

@dafeder How does the new approach of using a custom entity solves issue 2?

kimwdavidson commented 4 years ago

This proposal has the thumbs up to move forward, but needs to be ticketed

dafeder commented 3 years ago

Work being organized in https://github.com/GetDKAN/dkan/projects/4