AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing
Other
7 stars 24 forks source link

DwCACreator CSV files do not match meta.xml fields #204

Closed ansell closed 7 years ago

ansell commented 7 years ago

There are a number of cases that I discovered while reviewing the GBIF archive exports where the DwCACreator has generated meta.xml files with core fields that do not match the occurrence.csv files. Given that the occurrence.csv file lacks a header line, this makes it impossible to reliably interpret these DwCA files.

ansell commented 7 years ago

The place where meta.xml is defined is completely separate from the place where the CSV file is created, and potentially there is an inconsistency, given there is some logic in ExportUtil when deciding which fields to include:

https://github.com/AtlasOfLivingAustralia/biocache-store/blob/master/src/main/scala/au/org/ala/biocache/export/DwCACreator.scala#L145

https://github.com/AtlasOfLivingAustralia/biocache-store/blob/master/src/main/scala/au/org/ala/biocache/export/ExportUtil.scala#L114

ansell commented 7 years ago

The affected archives all seem to be from 2014, not from the recent updates. I didn't notice the file timestamps when I was previously looking through them as the otherwise look identical in their content. Closing this as not currently a bug.