AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Consolidate NatureMapr data #791

Closed peggynewman closed 4 months ago

peggynewman commented 2 years ago

Consolidate all of the NatureMapr data resources. Create a recurring job that pulls a full refresh from the API into a Darwin Core archive.

peggynewman commented 2 years ago

Progress update: the data is in production, but we're waiting

peggynewman commented 1 year ago

Still working on this - new data is in play but old DRs need to be removed.

peggynewman commented 1 year ago

To do for @cha801p

  1. The metadata has disappeared from the DR and the DR name is misspelled. The DR name has propagated to each occurrence record. Fix the metadata and wait for an index rebuild.

Metadata: Name: NatureMapr Short description: A citizen science platform to upload plant and animal sightings to contribute to real world outcomes across Australia. Long description: NatureMapr seeks to ensure every important plant and animal is known to the people charged in positions of power that can directly influence its protection, management or eradication. Anybody can report a plant or animal sighting in under a minute anywhere across Australia and: Promptly receive an expert identification of their record Be assured that the information will be received by the government organisations and research institutions that need to know about it Develop increased awareness and knowledge of important species through the sharing of knowledge within a thriving community

  1. These DRs need to be deleted

dr14081 - 724 Records - Albury Wodonga Nature Map dr702 - 31970 Records - Atlas of Life in the Coastal Wilderness dr14021 - 9125 Records - Budawang Coast Nature Map dr1947 - 90014 Records - Canberra Nature Map dr736 - 4575 Records - Frogwatch ACT and Region dr15273 - 1191 Records - Southern Highlands Nature Map

  1. Their corresponding datasets in GBIF need to be dealt with:

The DR (or "dataset") record can't be totally deleted in GBIF because it has a DOI associated with it. GBIF (helpdesk@gbif.org) says: We could link the old datasets to the new one before deleting the associated occurrences. The pages, DOI and citations will be preserved and they will link to the new dataset, see this example: https://www.gbif.org/dataset/84aa5ee4-f762-11e1-a439-00145eb45e9a If so, you will need to publish the new dataset and send us its link. We will then make the changes necessary.

Find the corresponding datasets in GBIF (click on the DOI link on the DR collectory page) and compile an email to GBIF saying that the datasets are to be deleted and replaced by the main one: https://www.gbif.org/dataset/7ebef267-9d72-4c21-a276-cc84281a8590

cha801p commented 1 year ago

Datasets have been deleted from GBIF.

  1. https://www.gbif.org/dataset/9431f690-2478-4e27-ad13-843efb77ac79
  2. https://www.gbif.org/dataset/e3e48696-f083-4c3a-8c5b-efb1589c54db
  3. https://www.gbif.org/dataset/c19a8f6e-c368-494d-942e-10108d8867a7
  4. https://www.gbif.org/dataset/4b75c0ac-f7c4-4883-942d-d1dfe6939754
  5. https://www.gbif.org/dataset/a79a2ab2-5de8-4ac6-a0de-3e994cd8f7f5

Patricia is writing a DAG to delete dr from prod after which additional drs will be removed from prod. https://github.com/AtlasOfLivingAustralia/data-management/issues/890

djtfmartin commented 1 year ago

The naturemapr website reports this at the bottom:

2,091,367 sightings of 18,686 species in 5,482 locations from 9,618 contributors

so it looks like the data feeds are only serving a fraction of the data. Is this right ?

peggynewman commented 1 year ago

When we first brought data in they sent 350K through their API, which included a bunch of eBird and BioNet records, and likely other museums etc. They've since removed them from the API. I'm guessing that in their front end they're included a bunch of other datasets. I can't confirm this.

cha801p commented 1 year ago

Datasets have been deleted from biocache. dr702 dr14021 dr1947 dr736 dr15273 dr14081