AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

dr344-QM New Vernon Endpoint #877

Closed rosemaryjoconnor closed 4 months ago

rosemaryjoconnor commented 1 year ago

Mtg 27/03/2023

rosemaryjoconnor commented 1 year ago

Information emailed after mtg:

The harvesting would be via the /opacobjects endpoint. By default this just returns the ids, with a subsequent call to each record being used to retrieve the individual records. However, this endpoint can be changed to retrieve full records.

Here is an example from our demo site (mostly social history and art objects), retrieving multiple full records with each request. The demo site is set to return 12 records per page, but this is configurable:

https://browser.vernonsystems.com/api/v3/opacobjects?view=detail

Subsequent pages are retrieved using the offset page, when the offset is in multiple of the page sizes.

e.g. Page 2 on our demo site, with 12 records per page is https://browser.vernonsystems.com/api/v3/opacobjects?view=detail&offset=12

rosemaryjoconnor commented 1 year ago

QM fetcher updated and tested locally.

rosemaryjoconnor commented 1 year ago

Waiting on databox

rosemaryjoconnor commented 1 year ago

dr344 latest update complete. Fetcher to be scheduled weekly. New fetcher requires data endpoints to be provided

rosemaryjoconnor commented 1 year ago

Details of new API have been provided. 13/07/2023

rosemaryjoconnor commented 1 year ago

Code for new endpoint looking good. Issue with extracting data for fields from API. JSON sting contains mix of single and double quotes json.loads errors

rosemaryjoconnor commented 1 year ago

Data extracted successfully. QM following up on missing collectionCodes

rosemaryjoconnor commented 1 year ago

Problems with collectionCode being absent. This may be an ongoing problem. Need to discuss re new key fields and UUID implications

rosemaryjoconnor commented 1 year ago

900K+ records now available in the database Running on databox to test records Will run fetcher on databox-dev also to test. Check of duplicate and missing records to finalise

rosemaryjoconnor commented 12 months ago

900K+ records need multi-threading to extract. Mahmoud has updated the fetcher ready for testing, but the extract will take over 4 hours. Meeting with QM to be scheduled to discuss

rosemaryjoconnor commented 10 months ago
rosemaryjoconnor commented 10 months ago

Databox:

rosemaryjoconnor commented 9 months ago

As at 01/11/2023 There were: 4,936 occurrence records with associated images over 3000 unique duplicates in the dataset

rosemaryjoconnor commented 9 months ago

Waiting on update of data with new fields to populate othercatalogNumbers and fieldNotes

peggynewman commented 7 months ago

Great work on this! Found some unhandled collection codes:

Image

Like this one, institutionCode = QM and collectionCode = 'fossil' ... may need an update to provider maps: https://biocache-test.ala.org.au/occurrences/66409d3c-ef31-4004-a217-ad277e50e757

rosemaryjoconnor commented 7 months ago

Provider maps are in Prod, but will double check again and set up any that are missing. Unable to create in Test: Create ProviderMap Property [userLastModified] of class [Collection] cannot be null Property [Most southern
latitude] of class [Collection] cannot be null Property [Most northern
latitude] of class [Collection] cannot be null Property [Most eastern
longitude] of class [Collection] cannot be null Property [Name] of class [Collection] cannot be null Property [uid] of class [Collection] cannot be null Property [Most western
longitude] of class [Collection] cannot be null

rosemaryjoconnor commented 6 months ago

09/02/2024 QM Fetcher live