Closed tkohr closed 1 year ago
Related to break change introduced in https://github.com/geonetwork/core-geonetwork/pull/6677
Definitely not an ODS expert but it looks like depending on if you request API version 1 and version 2 the datased id path is different.
The PR you're pointing at ODS API v2 support was added https://github.com/geonetwork/core-geonetwork/pull/6677/commits/a3db440527f72767d02db326e48cec2324e55d78 and should have preserved compability with version 1 API.
So before 4.2.3, harvesting an ODS API v2 was not working.
Running on main the following harvester config
{"@id":"228","@type":"simpleurl","owner":["1"],"ownerGroup":["2"],"ownerUser":["undefined"],"site":{"name":"6962","uuid":"d3e54543-097d-4bb8-bfe6-0fa9c04bb73d","account":{"use":false,"username":[],"password":[]},"url":"https://opendata.lillemetropole.fr/api/datasets/1.0/search?refine.publisher=M%C3%A9tropole+Europ%C3%A9enne+de+Lille&start=0&rows=20","icon":"blank.png","loopElement":"/datasets","numberOfRecordPath":"/nhits","recordIdPath":"/datasetid","pageSizeParam":"rows","pageFromParam":"start","toISOConversion":"schema:iso19115-3.2018:convert/fromJsonOpenDataSoft"},"content":{"validate":"NOVALIDATION","importxslt":"none","batchEdits":"[]"},"options":{"every":"0 0 0 ? * *","oneRunOnly":false,"overrideUuid":"SKIP","status":"active"},"privileges":[{"@id":"1","operation":[{"@name":"view"},{"@name":"dynamic"},{"@name":"download"}]}],"ifRecordExistAppendPrivileges":false,"info":{"lastRun":"2023-05-05T05:25:19.923Z","running":false,"result":{"added":"224","atomicDatasetRecords":"0","badFormat":"0","collectionDatasetRecords":"0","datasetUuidExist":"0","privilegesAppendedOnExistingRecord":"0","doesNotValidate":"0","xpathFilterExcluded":"0","duplicatedResource":"0","fragmentsMatched":"0","fragmentsReturned":"0","fragmentsUnknownSchema":"0","incompatible":"0","recordsBuilt":"0","recordsUpdated":"0","removed":"0","serviceRecords":"0","subtemplatesAdded":"0","subtemplatesRemoved":"0","subtemplatesUpdated":"0","total":"224","unchanged":"0","unknownSchema":"0","unretrievable":"0","updated":"0","thumbnails":"0","thumbnailsFailed":"0"}}}
for v1 API collects 224 records.
and playing
{"@id":"373","@type":"simpleurl","owner":["1"],"ownerGroup":["2"],"ownerUser":["undefined"],"site":{"name":"6962 v2","uuid":"cc6c2ae1-34a8-4ac6-bd19-8df33098f61b","account":{"use":false,"username":[],"password":[]},"url":"https://opendata.lillemetropole.fr/api/explore/v2.0/catalog/datasets?rows=100","icon":"blank.png","loopElement":"/datasets","numberOfRecordPath":"/nhits","recordIdPath":"/dataset/dataset_id","pageSizeParam":"rows","pageFromParam":"start","toISOConversion":"schema:iso19115-3.2018:convert/fromJsonOpenDataSoft"},"content":{"validate":"NOVALIDATION","importxslt":"none","batchEdits":"[]"},"options":{"every":"0 0 0 ? * *","oneRunOnly":false,"overrideUuid":"SKIP","status":"active"},"privileges":[{"@id":"1","operation":[{"@name":"view"},{"@name":"dynamic"},{"@name":"download"}]}],"ifRecordExistAppendPrivileges":false,"info":{"lastRun":"2023-05-05T05:46:25.882Z","running":false,"result":{"added":"10","atomicDatasetRecords":"0","badFormat":"0","collectionDatasetRecords":"0","datasetUuidExist":"0","privilegesAppendedOnExistingRecord":"0","doesNotValidate":"0","xpathFilterExcluded":"0","duplicatedResource":"0","fragmentsMatched":"0","fragmentsReturned":"0","fragmentsUnknownSchema":"0","incompatible":"0","recordsBuilt":"0","recordsUpdated":"0","removed":"1","serviceRecords":"0","subtemplatesAdded":"0","subtemplatesRemoved":"0","subtemplatesUpdated":"0","total":"10","unchanged":"0","unknownSchema":"0","unretrievable":"0","updated":"0","thumbnails":"0","thumbnailsFailed":"0"}}}
collect 100 records
So this seems fine to me, no?
So your issue was related to
String uuid = this.extractUuidFromIdentifier(record.get(params.recordIdPath).asText());
which only works if the property you need is a property of the loopElement node which is not the case in all JSON harvester and not in ODS API v2. So it was indeed changed to
String uuid = this.extractUuidFromIdentifier(record.at(params.recordIdPath).asText());
This explains why your config in 4.2.2 did not work in 4.2.3. By the way, a quite clear error is reported in the harvester log
2023-05-05T13:42:40,976 ERROR [geonetwork.harvester] -
Failed to collect record UUID at path datasetid.
Error is: Invalid input:
JSON Pointer expression must start with '/': "datasetid"
Thanks for looking into this @fxprunayre. Indeed, in the end, it's just the missing /
that breaks the ODS config from GN 4.2.2 to > 4.2.2.
I didn't pay attention that the mentioned PR was using ODS v2 having a different hierarchy and keys datasetid
, dataset_id
from V1, which obscured the problem a little, despite the rather clear error message.
Just to clarify, this had nothing to do with ODS API v2 (which we don't use). It was an error on our side, indeed it works with the new format for the record id pointer.
Thanks @fxprunayre
FYI, I opened https://github.com/geonetwork/doc/pull/240 regarding this.
Describe the bug Harvesting the following ODS catalog via the simple url harvester (which works on version 4.2.2) does not seem to work anymore. I have the feeling, this is related to the change that the
recordIdPath
input now expects a path/datasets/datasetid
(from the document root?). Or is it just me indicating the wrong path? In version 4.2.2 only the property keydatasetid
is indicated here.To Reproduce Steps to reproduce the behavior:
Expected behavior Harvest ~208 records from the catalog.
Log file harvester_simpleUrl_MEL_ODS_GN_main_202303301528.log
Desktop (please complete the following information):