InTaVia / InTaVia-Backend

API Backend for InTaVia project
MIT License
2 stars 0 forks source link

Some entity ids are NOT unique #208

Open samuelbeck opened 8 months ago

samuelbeck commented 8 months ago

Some entity ids returned by the REST API are not unique. Querying the /api/entities/{entity_id} endpoint returns different results in (at least) the following case:

https://intavia.acdh-dev.oeaw.ac.at/entities/aHR0cDovL3d3dy5pbnRhdmlhLmV1L3Byb3ZpZGVkX3BlcnNvbi8zNzU3

Should return Gustav Klimt but sometimes returns Ales Mikulas (both have the following URI: http://www.intavia.eu/provided_person/3757)

yoge1 commented 8 months ago

Hmm, quickly looking at the data in the SPARQL endpoint, I see the URI http://www.intavia.eu/provided_person/3757 only being used for Ales Mikulas, not for Gustav Klimt. Klimt's URI is http://www.intavia.eu/provided_person/30123.

Also, by making the query https://intavia.acdh-dev.oeaw.ac.at/entities/aHR0cDovL3d3dy5pbnRhdmlhLmV1L3Byb3ZpZGVkX3BlcnNvbi8zNzU3, I see it returning only Ales Mikulas (I ran it couple of times).

Are you able to reproduce the bug via a certain use sequence?

What I do know is that if we ingest new versions of source datasets and then run the Prefect flow generating Provided_Person instances, the URIs of the persons can change from the previous state of the knowledge graph (issue: https://github.com/InTaVia/prefect-flows/issues/22). So if this would be the case, then the same URI could be used for Klimt and Mikulas, but not in the same knowledge graph state.

samuelbeck commented 8 months ago

I'm able to reproduce the bug in the following way:

  1. Search for Gustav Klimt in the frontend (https://intavia.acdh-dev.oeaw.ac.at/search?q=gustav+klimt)
  2. Click on Gustav Klimt to open the detail page -> Gustav Klimt will be shown
  3. Delete local storage of the application (tutorial)
  4. Reload detail page of Gustav Klimt (https://intavia.acdh-dev.oeaw.ac.at/entities/aHR0cDovL3d3dy5pbnRhdmlhLmV1L3Byb3ZpZGVkX3BlcnNvbi8zNzU3) -> Mikulas Ales will be shown

Maybe either the entity search or entityById endpoint is returning out-of-date data?

yoge1 commented 8 months ago

Thanks. I don't have any new insight to offer here, but will report my findings:

Is it possible that our browsers' local storage had Gustav Klimt's older ID (if such ID's are stored in local storage)?

samuelbeck commented 8 months ago

Thanks for looking into this! Yes, entities and their IDs are stored in local storage, so that's a possibility.

yoge1 commented 8 months ago

Ok, then I think is indeed caused by a new run of the Prefect flow that generates the Provided_Person instances, which doesn't generate persistent identifiers (issue: https://github.com/InTaVia/prefect-flows/issues/22).