Open patkyn opened 4 years ago
Before load there were 42,635 records
aws-bstore-4b 2020-05-18 21:23:34,603 INFO : [DataLoader] - Unique terms: institutionCode,catalogNumber
aws-bstore-4b 2020-05-18 21:23:34,603 INFO : [DataLoader] - Column headers: basisOfRecord,dcterms:type,dcterms:rightsHolder,institutionCode,collectionID,occurrenceStatus,occurrenceID,catalogNumber,otherCatalogNumbers,verbatimModified,dcterms:modified,family,genus,specificEpithet,infraspecificName,scientificName,verbatimLocality,stateProvince,country,waterBody,minimumDepthInMeters,maximumDepthInMeters,verbatimEventDate,eventDate,decimalLatitude,decimalLongitude,coordinatePrecision,footprintWKT,verbatimLatitude,verbatimLongitude,typename,typeStatus,typeauthor,typeyear,identifiedBy,samplingProtocol,identificationVerificationStatus,dateIdentified,verbatimDateIdentified,identificationRemarks
aws-bstore-4b 2020-05-18 21:23:38,320 INFO : [DataLoader] - 1000, >> last key : dr349|CSIRO|H8313-42, records per sec: 269.10657
aws-bstore-4b 2020-05-18 21:23:41,168 INFO : [DataLoader] - 2000, >> last key : dr349|CSIRO|H8215-16, records per sec: 351.1236
aws-bstore-4b 2020-05-18 21:23:43,970 INFO : [DataLoader] - 3000, >> last key : dr349|CSIRO|H8312-09, records per sec: 356.88794
aws-bstore-4b 2020-05-18 21:23:46,657 INFO : [DataLoader] - 4000, >> last key : dr349|CSIRO|B3384, records per sec: 372.16226
aws-bstore-4b 2020-05-18 21:23:49,216 INFO : [DataLoader] - 5000, >> last key : dr349|CSIRO|H7457-03, records per sec: 390.77765
aws-bstore-4b 2020-05-18 21:23:51,507 INFO : [DataLoader] - There are 5858 records in the file. The number of NEW records: 2041
aws-bstore-4b 2020-05-18 21:23:51,507 INFO : [DataLoader] - Load finished for anfc20200511.csv
aws-bstore-4b 2020-05-18 21:23:51,646 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2020-05-18 21:23:51,646 INFO : [Loader] - Completed loading resource: dr349. Completed in 21.516seconds (0.3586 minutes)
After indexing, there are now 44,680 records
Emailed data provider.
[x] After creation, add this issue to the Data Management project board https://github.com/orgs/AtlasOfLivingAustralia/projects/9
[x] Map dataset to Darwin Core Terms if necessary http://rs.tdwg.org/dwc/terms/
[x] Upload the dataset to collections.ala.org.au
[x] Load/sample/process dataset http://aws-scjenkins.ala:9193/job/Parameterised%20Load%20Sample%20Process/
[x] Note the statistics for total and new records from the end of the log file for the load by pasting the relevant lines here:
[x] Wait for next complete reindex
[x] Check that the number of records accessible match the loaded numbers using https://biocache.ala.org.au/occurrence/search?q=data_resource_uid:drNNN
[x] Ask the data provider to review the new records using the same URL