Esri / geoportal-server-catalog

Esri Geoportal Server is a next generation open-source metadata catalog and editor, based on elasticsearch.
https://www.esri.com/en-us/arcgis/products/geoportal-server/overview
Apache License 2.0
97 stars 60 forks source link

Item identifier vs fileIdentifier #473

Closed kep1n closed 1 year ago

kep1n commented 1 year ago

Hi there,

This is more of a question than an issue but we are experiencing some troubles regarding the internal identifier used by Geoportal when harvesting an existing CSW in old Geonetwork to deploy a new Geoportal. Everything seems to work flawlessly but the identifier set by Geoportal is alike to a Portal for ArcGIS element id and does not match with the harvested element fileIdentifier. From what we can see from other CSW services the identifier and fileIdentifier tend to match to ease the GetRecordById request.

In this case we must perform a GetRecords request prior to a GetRecordById. ¿Is there a way to force Geoportal use the fileIdentifier as the identifier in the harvesting/uploading/creating process?

Thanks,

capture

mhogeweg commented 1 year ago

hi. yes, there is a way to change what field in the metadata Geoportal Server uses for the identifier. I suggest the topic of Evaluators on the wiki as a start: https://github.com/Esri/geoportal-server-catalog/wiki/Evaluators. Evaluators pick elements from the configured metadata types and store them in common fields in the elastic index. One of those elements is the identifier.

Key is that this identifier is unique and persistent. This is a bit of a challenge with ISO metadata with the fileIdentifier is optional and oftentimes not used. While an identifier may be some URI/URL, we have seen cases of recurring values, meaning the fileIdentifier is also not unique.

if this situation does not apply to your case, you could change the evaluator for your metadata and pull the fileIdentifier to use as the main identifier for a record

kep1n commented 1 year ago

Thank you for your quick feedback! I've been fiddling with the Evaluators but I lack a bit of development background and I don't think this is as easy as adding a line like G.evalProp(task,item,root,"_id","gmd:fileIdentifier/*/text()"); cause I'm getting a 400 Error before a 500 error. Is there any further information regarding Evalutators apart from Github Wiki to deepen into this subject?

mhogeweg commented 1 year ago

ah, wait, that _id cannot be modified. That is an id assigned by elasticsearch. What you can change is all in the _source element. There you will find the fileid field that for ISO metadata is based on the file identifier. See this example: https://gpt.geocloud.com/geoportal2/rest/metadata/item/d1b6fed3e06d4960af263970ad7b0bd6?pretty=true

kep1n commented 1 year ago

I'm focusing on the _id cause I see that's the same id used to identify the resource on Geoportal when you perform a GetRecord (as you can see on the image). The same happens for a GetRecordById request. That _id's value is mandatory to get a valid response. I'm looking for a way, if it's feasible, to modify dc: identifier's value to match the fileIdentifier when harvesting/uploading new resources so I can directly perform a GetRecordById knowing just the fileIdentifier.

Identifier image

kep1n commented 1 year ago

Hi again,

Finally we manage to set the fileIdentifier as internal _id but just within the geoportal environment modifying MetadataEditor.js and UploadMetadata.js which control the Create Metadata and Upload Metadata options respectively. Everything seems to work properly, letting us delete individual items, searching for them, etc.

Now, could you please tell us where should be looking at to apply similar approach on the Harvester side? As we can see from using DevTools it requests process, triggers and execute links through the rest but cannot get further before digging into the source code.

Edited:

Found this in the ElasticContext.java. I believe this is it :smile: Snipaste_2022-10-28_11-11-46

mhogeweg commented 1 year ago

the allowFileId use would indeed achieve your goal. Sorry, I totally missed this setting described in: https://github.com/Esri/geoportal-server-catalog/wiki/Elasticsearch-configuration

"If set to "true", you can use the metadata file id as Elasticsearch id, the id will not be used if the id contains forward slash (e.g. in urls)."

kep1n commented 1 year ago

Already made the change on the source code and it works flawlessly. Thank you very much.