[ ] Check the unique identifier matches that in the new dataset
Data Prep
[ ] Map dataset to Darwin Core Terms in a CSV if necessary http://rs.tdwg.org/dwc/terms/. Package the CSV into a Darwin Core Archive
[ ] Ensure the dataset has correctly specified unique identifier without duplicates, and matches the unique identifier in the collectory connection parameters
[ ] Ensure images are specified in an images file in the DwCA with the identifier field populated
[ ] Check that species, location, times are adequately specific. Check back with the data provider if necessary
[ ] Download existing DwCA from hdfs and merge the two DwCAs
changed column name from catalogueNumber to catalogNumber
Fixed catalogNumber format
No duplicates found
Following are the logs on loading the data:
2023-03-22 12:58:07,951 INFO [main] metrics.MetricsHandler (MetricsHandler.java:getCountersInfo(43)) - Added pipeline metadata - preservedUuidsAttempted: 6465, newUuidsAttempted: 179, orphanedUniqueKeysAttempted: 96,
2023-03-22 12:58:08,029 INFO [main] metrics.MetricsHandler (MetricsHandler.java:saveCountersToFile(62)) - Metadata was written to a file - hdfs://aws-spark-quoll-master.ala:9000/pipelines-data/dr8128/1/uuid-metrics.yml
2023-03-22 12:58:08,029 INFO [main] beam.ALAUUIDMintingPipeline (ALAUUIDMintingPipeline.java:run(309)) - Writing metrics written.
2023-03-22 12:58:08,032 INFO [pool-1-thread-1] util.ShutdownHookManager (Logging.scala:logInfo(54)) - Shutdown hook called
2023-03-22 12:58:08,032 INFO [pool-1-thread-1] util.ShutdownHookManager (Logging.scala:logInfo(54)) - Deleting directory /data/spark-tmp/spark-a88d5d51-bb26-4bc6-8ae8-a4a4a2089282
2023-03-22 12:58:08,035 INFO [pool-1-thread-1] util.ShutdownHookManager (Logging.scala:logInfo(54)) - Deleting directory /tmp/spark-339fd9bf-efeb-4693-8a08-ee383e6b3344
22-Mar 12:58:08 [LA-PIPELINES] [dr8128] [INFO] Wed Mar 22 12:58:08 AEDT 2023
22-Mar 12:58:08 [LA-PIPELINES] [dr8128] [INFO] END UUID of dr8128 in [spark-cluster], took 0 minutes and 49 seconds.
New run name is '#5955 - dr8128'
Finished: SUCCESS
Metadata
Data Prep
Data Load
QA