AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Reef Life Survey event core data load #882

Open peggynewman opened 1 year ago

peggynewman commented 1 year ago

Summary

Work with marine.csiro.ipt to create a regular harvest for Reef Life Survey data into the ALA (incl Events system). Doug's data transform is here: https://github.com/AtlasOfLivingAustralia/databox/tree/master/reef-life-survey for the events data but also find and replace the old RLS data resource.

Email trail

From RLS: Datasets 5 and 6 below are not species records and are not needed here. You could use 1-4 as the basis for then filtering if you like, but it may be safest to start with 1 and 3 to represent the update of what you had, as we are only just starting with correcting a large backlog of errors, and 2 and 4 have more of these and more taxonomic uncertainties. It’s up to you though, and probably best if you do have a system for updating the records on a routine basis anyway – as our correction of errors does often include effectively removing some species observations and replacing them with a different (correct) one.

Either way, you’ll still need to filter on ‘program’ to get Reef Life Survey, as these new layers also have the ATRC and Parks Vic datasets, which we spoke about last time we talked (and which are not something that I can talk to re putting onto the ALA, GBIF and OBIS).

From Peggy: We have applied an older RLS dataset into our pilot events system as we discussed last year, and are now looking at updating occurrence records in the ALA. We sourced this data from here: https://metadata.imas.utas.edu.au/geonetwork/srv/eng/catalog.search#/metadata/9c766140-9e72-4bfb-8f04-d51038355c59

This seems to be made up of two datasets,

We'd like to update the ALA now (and set up regular updates), and I think we spoke on the phone about how to filter datasets from the NRM data in the AODN to pull those that you were comfortable with going on to the ALA (and subsequently GBIF and OBIS), but I haven't got this documented anywhere. When I search on "RLS" in AODN, I get these results:

  1. IMOS - National Reef Monitoring Network Sub-Facility - Global reef fish abundance and biomass
  2. IMOS - National Reef Monitoring Network Sub-Facility - Global cryptobenthic fish abundance
  3. IMOS - National Reef Monitoring Network Sub-Facility - Global off-transect species observations
  4. IMOS - National Reef Monitoring Network Sub-Facility - Global mobile macroinvertebrate abundance
  5. IMOS - National Reef Monitoring Network Sub-Facility - Survey metadata
  6. IMOS - National Reef Monitoring Network Sub-Facility - Site information Can we upload each of these? Or just a subset of them (eg 1 and 3 as per the older dataset)?

From Dave: This image is of the 5 datasets from the IMOS National Reef Monitoring Network Sub-Facility in our internal dataset caalogue. This data import is dated from March 2023, and I have downloaded today's data which contains more rows. This I can convert and load into DwC and publish to you and OBIS via IPT (another 6 million records) The conversion code is done and dusted and can be applied imediately.

Image

cha801p commented 4 months ago

Ticket Update: April 29, 2024 (7 PM)

Issue: Load sample data (https://collections-test.ala.org.au/dataResource/show/dr22666)

Solution: Successfully loaded the new dataset on biocache-test/events-test

Actions Taken:

Issues Encountered: _File "/Users/cha801/PycharmProjects/databox/data-resources/dr21639-test-Reef-life-survey-to-event-core/main.py", line 371, in : events_table = dwca.Table(eventsframe, "http://rs.tdwg.org/dwc/terms/Event", "event.csv", "eventID") AttributeError: 'DwCA' object has no attribute 'Table'

Status: Sample data loaded on biocache-test/events-test Further investigate code/ Rewrite code Waiting for review from Peggy

peggynewman commented 3 months ago

Need to check this out with Sachit, who I think has already done some of this work. Please book a meeting with us all to talk about Reef Life Survey.