Open timhicks-ala opened 3 years ago
Jenkins load job has been setup that calls the existing Talend ETL to process the Flora and Fauna files and convert to darwin core csv.
dr365 Before load: 1,609,169 records During load: aws-bstore-4b 2021-04-12 17:47:30,150 INFO : [DataLoader] - There are 1623826 records in the file. The number of NEW records: 15628 aws-bstore-4b 2021-04-12 17:47:30,150 INFO : [DataLoader] - Load finished for dr365.csv aws-bstore-4b 2021-04-12 17:47:30,369 INFO : [DataLoader] - Registry response code: 200 aws-bstore-4b 2021-04-12 17:47:30,369 INFO : [Loader] - Completed loading resource: dr365. Completed in 2351.847seconds (39.19745 minutes)
dr366 Before load: 1,891,387 records During load: aws-bstore-4b 2021-04-12 19:36:45,795 INFO : [DataLoader] - There are 1905982 records in the file. The number of NEW records: 14006 aws-bstore-4b 2021-04-12 19:36:45,795 INFO : [DataLoader] - Load finished for dr366.csv aws-bstore-4b 2021-04-12 19:36:45,990 INFO : [DataLoader] - Registry response code: 200 aws-bstore-4b 2021-04-12 19:36:45,990 INFO : [Loader] - Completed loading resource: dr366. Completed in 2878.74seconds (47.979 minutes)
https://support.ehelp.edu.au/a/tickets/101120
Mon Mar 1 2021 04:02:57 AM The SA Department for Environment and Water has created an updated version of the SA Flora (BDBSA) and SA Fauna (BDBSA) records on the ALA FTP site. Please update the datasets metadata with the month and year of this extraction - 03/2021
[ ] After creation, add this issue to the Data Management project board https://github.com/orgs/AtlasOfLivingAustralia/projects/9
[ ] Ask data provider for metadata, including:
Data resource name
Description
License : prefer Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/
Citation : Statement to be used by those downloading and using these records
Contact : Email and/or phone number
Website : (Optional)
[ ] Map dataset to Darwin Core Terms if necessary http://rs.tdwg.org/dwc/terms/
[ ] Create new data resource using https://collections.ala.org.au/admin
[ ] Upload the dataset to collections.ala.org.au
[ ] [Optional] If the dataset is destined to be included in a data hub, identify the data hub and add the new data resource id to its list https://collections.ala.org.au/dataHub/list
[ ] Load/sample/process dataset http://aws-scjenkins.ala:9193/job/Parameterised%20Load%20Sample%20Process/
[ ] Note the statistics for total and new records from the end of the log file for the load by pasting the relevant lines here:
[ ] Wait for next complete reindex
[ ] Check that the number of records accessible match the loaded numbers using https://biocache.ala.org.au/occurrence/search?q=data_resource_uid:drNNN
[ ] Ask the data provider to review the new records using the same URL