Closed rosemaryjoconnor closed 4 months ago
Information emailed after mtg:
The harvesting would be via the /opacobjects endpoint. By default this just returns the ids, with a subsequent call to each record being used to retrieve the individual records. However, this endpoint can be changed to retrieve full records.
Here is an example from our demo site (mostly social history and art objects), retrieving multiple full records with each request. The demo site is set to return 12 records per page, but this is configurable:
https://browser.vernonsystems.com/api/v3/opacobjects?view=detail
Subsequent pages are retrieved using the offset page, when the offset is in multiple of the page sizes.
e.g. Page 2 on our demo site, with 12 records per page is https://browser.vernonsystems.com/api/v3/opacobjects?view=detail&offset=12
QM fetcher updated and tested locally.
Waiting on databox
dr344 latest update complete. Fetcher to be scheduled weekly. New fetcher requires data endpoints to be provided
Details of new API have been provided. 13/07/2023
Code for new endpoint looking good. Issue with extracting data for fields from API. JSON sting contains mix of single and double quotes json.loads errors
Data extracted successfully. QM following up on missing collectionCodes
Problems with collectionCode being absent. This may be an ongoing problem. Need to discuss re new key fields and UUID implications
900K+ records now available in the database Running on databox to test records Will run fetcher on databox-dev also to test. Check of duplicate and missing records to finalise
900K+ records need multi-threading to extract. Mahmoud has updated the fetcher ready for testing, but the extract will take over 4 hours. Meeting with QM to be scheduled to discuss
Databox:
As at 01/11/2023 There were: 4,936 occurrence records with associated images over 3000 unique duplicates in the dataset
Waiting on update of data with new fields to populate othercatalogNumbers and fieldNotes
Great work on this! Found some unhandled collection codes:
Like this one, institutionCode = QM and collectionCode = 'fossil' ... may need an update to provider maps: https://biocache-test.ala.org.au/occurrences/66409d3c-ef31-4004-a217-ad277e50e757
Provider maps are in Prod, but will double check again and set up any that are missing.
Unable to create in Test:
Create ProviderMap
Property [userLastModified] of class [Collection] cannot be null
Property [Most southern
latitude] of class [Collection] cannot be null
Property [Most northern
latitude] of class [Collection] cannot be null
Property [Most eastern
longitude] of class [Collection] cannot be null
Property [Name] of class [Collection] cannot be null
Property [uid] of class [Collection] cannot be null
Property [Most western
longitude] of class [Collection] cannot be null
09/02/2024 QM Fetcher live
[ ] QM data to be extracted from new endpoint
[ ] Implementation of OAI-PMH on Vernon Browser as an upgrade to sustain the mobilisation of QM data to the ALA.
Mtg 27/03/2023