NASA-PDS / validate

Validates PDS4 product labels, data and PDS3 Volumes
https://nasa-pds.github.io/validate/
Apache License 2.0
15 stars 11 forks source link

Switchover to using registry-common library to communicate with OpenSearch Serverless Registry #895

Open jordanpadams opened 1 month ago

jordanpadams commented 1 month ago

💡 Description

Complete switchover for referential integrity checker

⚔️ Parent Epic / Related Tickets

No response

al-niessner commented 1 month ago

@jordanpadams @tloubrieu-jpl

Design choice - I am split or would not be asking - on config file for validate-refs. I get that the most convenient thing for the user is to let them use harvest config. The new config is more complex and controlled with a schema. Great. For validate-ref, this means it now needs to:

  1. continue sax parsing and hope harvest config does not change too much more
  2. depend on harvest as well as registry-common
  3. not use harvest config as auth input but do like all other registry-mgr tools
  4. move validate-refs out of validate and into registry-mgr since it has nothing to do with validate other than the name.
  5. move validate-refs to harvest since it has more in common with it than validate

I think 2, 4 and 5 are up for bat. 2 seems the most direct and with maven maybe we do not care one wit about all the packages that we reference. 4 offers a cleaner solution since validate-refs uses nothing in validate at all but is probably there because the same user that runs harvest and validate also wants to run validate-refs but cares nothing for tools in registry-mgr. If true about users that use harvest want to do validate-refs, then 5 seems like the best solution since they share the config and both having nothing in common with validate itself.

tloubrieu-jpl commented 1 month ago

Hi @al-niessner ,

I don't have a final answer on that, but to feed your internal discussion, I think we would like all the tools provided to the discipline node users to be available in a consistent way. What they need is:

I don't think the same users need validate, validate-refs and harvest/registry-mgr. For example SDS development guys use validate, but not harvest or registry-mgr or validate-refs.

I don't know though where that would make more sense to move validate-refs.

We also have this ticket on which we could piggy back your suggested changes: https://github.com/NASA-PDS/registry-loader/issues/11. If we move forward with that ticket, we would not have to choose in between harvest or registry-mgr to move validate-ref (options 4 or 5).

Can that decision wait for Jordan to be back, next week ? Thanks

al-niessner commented 1 month ago

@jordanpadams @tloubrieu-jpl

No rush on decision. I can easily punt for now.

I would suggest not implementing NASA-PDS/registry-loader#11. With Java anyway, smaller repos are better -- Java bloat is really bad. Maybe a registry-loader that depends on all the other repos for the user that then gives them a better/simpler install 1 to get 5, but combining the java code will get unwieldy fast.

tloubrieu-jpl commented 1 month ago

I am realizing: should we switch to using the search API directly, since Alex fixed the pagination performance issue.

tloubrieu-jpl commented 1 month ago

@jordanpadams @al-niessner ☝️

al-niessner commented 1 month ago

@alexdunnjpl @jordanpadams @tloubrieu-jpl

Short answer: update and march forward.

Longer answer:

We all blabber about the learning curve and it is more expensive to change later than now . However, we never include the forgot curve. The real effort is the area between these two curves. It can also be shown that the forgot curve always plummets faster than the learning curve rises. By the time you eventually upgrade you will have forgotten all of the workarounds, patches, cruft, and other bad code to use the older interface that now also need updating.

If the pagination stuff is broken in SDK, then opensearch will fix it because the rest of the world will apply pressure. If they do not fix it because we are using the wrong technology (rest of world thinks pagination works well enough as is), then, well, change technologies.

Having rewritten harvest registry-common, and registry-mgr to use SDK v2, it is wonderful compared to the API direct interface. v2 does mean a complete rethink and recode of registry-api.