adiwg / mdTranslator

Metadata translation tool built using Ruby
https://www.adiwg.org/mdTranslator/
The Unlicense
14 stars 12 forks source link

Dcat us v1/cleanup #326

Closed jbrown-xentity closed 6 months ago

jbrown-xentity commented 6 months ago

This PR does a few things:

  1. Adds some documentation on how to install and run tests for development, and make a tweak to the Gemfile to get install working on my machine. Also, kudos on all the tests, over 1700 tests and 14K assertions!
  2. Adds a github action that runs the tests automatically on every push. See our fork for how that will work. Will be useful in future collaboration and PR's, making sure that any changes aren't causing things to break.
  3. Updates DCAT-US. The current implementation wouldn't be harvestable by data.gov, as it doesn't expect the namespaces to be used. See examples that we test against here, and DCAT-US documentation here

Please note that in git history, I added the documentation and github actions before I made the changes; so you can confirm that the current tests have 3 failures and 1 error, and it is the same after the DCAT-US changes.

Tagging @jwaspin for review.

hmaier-fws commented 6 months ago

@jwaspin not sure if it's worth saving a copy of the current translations that use the "dcat" namespace. While I agree that "DCAT-US" doesn't use the namespaces, there has been discussion of supporting a generic JSON-LD and perhaps a generic DCAT (not DCAT-US) output. See #283

jwaspin commented 6 months ago

@hmaier-fws I was wondering about that. Would it be worth having separate writers for DCAT-US, DCAT, and JSON-LD? I could also look into having them listed as separate writers in the dropdown but reuse the code with some conditionals on which namespace to add if any.

hmaier-fws commented 6 months ago

@jwaspin Yes. They would be distinct writers. DCAT-US will have the most restrictive set of attribute that could be supported. A generic DCAT and JSON-LD translation should support most (all?) of what is supported by DCAT-US, but allow for export of additional mdJson attributes.

I'm not sure that we need to actually implement this in the translator right now. I was just thinking about not losing what we already have. But perhaps it's not that big of a problem to start with the the DCAT-US writer if we decide to go down that road at some point in the future (since most of the DCAT-US element names align with the DCAT elements).

jbrown-xentity commented 6 months ago

While it is easier to rip things out, my 2 cents:

Supporting a generic JSON-LD sounds like a good goal, and I think data.gov is interested in the future in using formats that make more sense than the current custom one (also, note that there is an effort to update the DCAT-US to a new version, see here). However, if we want this to be ingestible by data.gov today, we'll need these changes.

jwaspin commented 6 months ago

@jbrown-xentity & @hmaier-fws I created a couple of branches based on the current state of this one. One for DCAT and another for JSON-LD. I will go ahead and merge these into the DCAT-US-v1 branch and we can keep moving along without having to worry about reverting back for the other writers.