GetDKAN / dkan

DKAN Open Data Portal
https://dkan.readthedocs.io/en/latest/index.html
GNU General Public License v2.0
366 stars 170 forks source link

Remove RootedJsonData and revert to simpler validation #3822

Open dafeder opened 2 years ago

dafeder commented 2 years ago

We thought that having a dedicated constantly-self-validating data type for moving JSON metadata around in DKAN's PHP code would be valuable, but this doesn't seem to have been the case. It has added to the complexity of the application and still requires a lot of converting back and forth to other data types. It would be preferable to switch to moving either full MetastoreItem objects or simple stdClass objects around, and performing validations as needed. So:

  1. Change all RootedJsonData objects to stdClass objects in the code
  2. Perform JSON schema validations in all places in the code in which RootedJsonData objects were instantiated or modified directly, to ensure that we are not losing validation protection at any point in our workflows.
dafeder commented 2 years ago

There are only a few points in the workflow where the metadata really needs to be validated, while there are many points in the workflow where metadata is expected to be temporarily invalid (for example, before it's been dereferenced) and we have to do extra work to bypass RootedJsonData's strict validation. We have places in the codebase where we expect JSON strings, others where we expect decoded stdClass objects, and others where we expect RootedJsonData objects. I think we should just use stdClass everywhere and convert to or from string only at the beginning or end of a workflow as needed.

There might be a case to made for using JsonObject from Galbar's JsonPath library, because it allows you use nested properties when you're doing dynamic property names. For instance, if we wanted to have a dynamic property name for "firstName" and in our schema it lives in a ContactInfo object under the main schema, we could have something like $firstName_property = "$.ContactInfo.firstName" with JsonPath. JsonPath is what we're using internally for data in RootedJsonData, but we're not actually using that kind of nested dynamic property name anywhere in DKAN as far as I can tell. And the trade-off would be that we still need to do a lot of conversion to work with any Symfony APIs or other external libraries that are expecting stdObject arguments for their methods.

dafeder commented 2 months ago

Also noting here - we are considering a use-case where we would allow metadata to exist in an invalid state if it is not published, so that data publishers have the option to work on their metadata and save drafts even if, for instance, they have not added all required fields yet. If we do persue this it would make the rigid enforcement we get from RootedJsonData even more problematic.