AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Improvements to ingest dataset build job #801

Open patkyn opened 2 years ago

patkyn commented 2 years ago

1) Check for lockdir Questacon had sent an extra column containing spaces in the sightings. This resulted in extra space in the column in meta.xml in the dwca, causing the ingestion to fail. lockdir that was created during ingestion was not cleaned up. When the automated job data resource load job ran for dr1902 the next time, it still resulted in success although the latest dwca file is not ingested because of the lockdir. A simple check is needed to test for lockdir exist for eg /dwca-export/dr1902/dr1902.zip.lockdir. If this dir exist, the build should throw an error and the lockdir needs to be manually removed.

2) Mark ingest as failed when uuid validation fails Currently, the ingestion still result in success even if the validation fails. This results in the full indexing to fail the following day. The ingest dataset job needs to be marked failure when validation fails in uuid step so that this can be fixed immediately.

patkyn commented 2 years ago

This is now fixed in jenkins