AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

eventType support in events is inconsistent #1058

Open cha801p opened 1 month ago

cha801p commented 1 month ago

We’ve identified several issues with the handling of the eventType on events.test:

As eventType has been added to dwc standard: https://dwc.tdwg.org/list/#dwc_eventType , It's required to have this change in the pipeline's code.

cha801p commented 4 weeks ago

DwC Terms were updated in the preingestion and this was tested using dr22687. Please refer to the ticket https://github.com/AtlasOfLivingAustralia/data-management/issues/1054 for more details.

adam-collins commented 3 weeks ago

In the DAG elastic_dataset_indexing, elastic-cleanup.sh is run. This should be the solution for the 2nd issue, duplicates.

adam-collins commented 3 weeks ago

In the most recent DwCA events archives export; event, verbatim_event and verbatim_occurrence have thehttp://rs.tdwg.org/dwc/terms/eventType field. However, event.txt has no value in that field. See dr22687, fully processed, exported and elastic indexed with the more pipelines version in test.

adam-collins commented 3 weeks ago

Is there a data resource where the ingestion of the eventType fails?

adam-collins commented 3 weeks ago

Is it intentional that only the event.txt (and verbatim files) has eventType? Should occurrence.txt also have eventType field. See the exported meta.xml for dr22687.

adam-collins commented 3 weeks ago

There is a problem running the DAG sh to delete the index before updating. At first glance it appears to be a elasticsearch network.host permission issue.