OP-TED / ted-rdf-mapping-eforms

TED-RDF Mapping Suites for eForms Notices
European Union Public License 1.2
2 stars 0 forks source link

Question about external hashing service #29

Open cristianvasquez opened 2 months ago

cristianvasquez commented 2 months ago

Part of the URIs is built by hashing the path to an entity.

Currently, the hash is done through HTTP calls to: https://digest-api.ted-data.eu/api/v1/hashing/fn/uuid/

Drawbacks are related to performance (An HTTP call for each subject is expensive) and resilience (what happens when the HTTP service is not available?)

Question: what would be the main reason to use an external hashing service?

There is also an alternative:

Generate an ID using an xpath function: generate-id(.)

I'm posing the question, because I see the XPath hashing function as enough for our use cases.

schivmeister commented 1 month ago

This is indeed a valid concern. The XPath function generate-id() did not appear to be foolproof as far as possible collisions are concerned (would two different notice elements generate the same ID at the same XPath coincidentally having the same depth?), as much as a UUID would be, hence this online approach. Did you manage to evaluate it taking that into consideration?

cristianvasquez commented 1 month ago

Hi!

I was just referring to the online call vs a local one choice, for that, I need to understand better the objective of such a service. So we could say it is mainly to avoid collisions?

All the service calls I inspected were of the form:

unparsed-text('https://digest-api.ted-data.eu/api/v1/hashing/fn/uuid/' || encode-for-uri(path()) || '?response_type=raw')

This led me to think this service hashes the path, producing something similar to generate-id(.) at the end.