Open jeanetteclark opened 6 months ago
breaking this down into some steps with implementation options:
updating this issue - this task got passed off to the check code to handle. metadig-engine
will get the list of data pids, and pass them to the check, where the check will use python hashstore to access the data. The only change required of metadig-engine here is that I added a store configuration to metadig.properties
, which also gets passed to the check code.
see:
edu.ucsb.nceas.mdqengine.findDataPids
(MDQEngine:231, 116)helm/metadig-controller/config/metadig.properties
edu.ucsb.nceas.mdqengine.dispatch.Dispatcher
(dispatch/Dispatcher:67)
It would be much more efficient to access the data and metadata directly from the file system where possible, especially for data quality checks.
The engine should use the hashstore library to access files directly and pass them to checks.
This change should be compatible with the existing method of getting data/metadata since the engine will not always run on the same machine that data are stored on (eg: ESS-DIVE).