NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link

Allow metadig-engine to access data and metadata directly from the file system #432

Open jeanetteclark opened 1 month ago

jeanetteclark commented 1 month ago

It would be much more efficient to access the data and metadata directly from the file system where possible, especially for data quality checks.

The engine should use the hashstore library to access files directly and pass them to checks.

This change should be compatible with the existing method of getting data/metadata since the engine will not always run on the same machine that data are stored on (eg: ESS-DIVE).

jeanetteclark commented 2 weeks ago

breaking this down into some steps with implementation options:

  1. get a list of data identifiers for an incoming metadata pid
    • [ ] solr query (to be implemented now)
    • [ ] parsing annotations in hashtore (to be added as an alternate implementation later, when this feature exists in hashstore)
  2. pass those pids to the dispatcher where they will be handled
    • the best place to do this is probably runSuite where we can detect if it's a data suite and only make the call to get the pids if needed