AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Load data : Victorian Biodiversity Atlas #619

Open peggynewman opened 4 years ago

peggynewman commented 4 years ago

https://collections.ala.org.au/public/showDataResource/dr1097 Currently ~7m records

Data is sourced from Vic Gov Spatial Data Mart, have to log in and order the files to be generated for download https://discover.data.vic.gov.au/dataset?sort=score+desc%2C+metadata_modified+desc&q=victorian+biodiversity+atlas&organization=&groups=&res_format=

4 files:

Recommend:

Other checks:

charvolant commented 4 years ago

Changes between data:

Record mapping

Available but not used

peggynewman commented 4 years ago

The record count disparity is likely because you've only loaded 2 of the 4 files. I only put 2 up onto the server.

peggynewman commented 4 years ago

Worthy of discussion with VBA are the dataGeneralisations. The loads are divvied up according to accuracy which looks like some generalisation has been applied. I also think the file name should go into datasetName.

charvolant commented 4 years ago

With all 4 files, approx 825337 present in the ALA not in current dump and 1673260 new records. 7753164 total records, 6905243 in the ALA. 6905243 + 1673260 - 825337 = 7753166 (probably correct when accounting for accidental header inclusion). Still, 825337 is a pretty large number, so check VBA to ensure that they're deletes.