Prez link generation is slow

edmondchuc commented 1 year ago

Currently, the /object endpoint takes an IRI and loads an object. It's slow because it does an N+1 database query for each IRI it finds within the resource to generate the prez links. So for a concept scheme with hundreds of top concepts, it will query the database for each one and attempt to generate a prez link. As you have mentioned in the past, this is the naive solution because it's slow but correct.

Is there any way where we can use some of the info in the profiles to explicitly ignore relationships we don't want included during prez link generation? This kind of handling is similar to the custom queries we have now to handle the VocPrez endpoints for the progressive loading of large vocabs because we needed to ensure we were only loading the proximate information that's needed to render the UI and not a description of the entire vocab (like a SPARQL DESCRIBE query would).

I'm raising this because in the BGS case, they have very large vocabularies with hundreds or thousands of concepts and a render on the /object endpoint is extremely slow. For example, https://data-uat.bgs.ac.uk/object?uri=http://data.bgs.ac.uk/ref/Geochronology. If we can somehow instruct Prez to not include certain properties on routes like skos:hasTopConcept/skos:topConceptOf, then the loading of this vocab on the /object endpoint will be very fast, I think.

edmondchuc commented 1 year ago

hjohns commented 12 months ago

Expect when caching is introduced, it may help. For large vocabs, the initial request will still have issues. Conditional prez-link generation may be needed. Issue potentially where there is a large number of top concepts. Further investigation required.

hjohns commented 11 months ago

Solved in the current set of changes by using a simple cache, however the initial request load time is not improved.

Other options to discuss:

Can we perform conditional link generation (Co-design session needed)
Warming requests on startup

Meeting to be scheduled for next Monday.

edmondchuc commented 11 months ago

My current understanding:

/object requests are slow because regardless of what the class type is or what profile it uses, it always describes the entire object and processes the IRIs found to prez links.
Slow generic object query via the normal system paths (/s, /c)
- This is slow because the generic query gets the entire description of the object and processes the IRIs found to prez links
- This does not affect large vocabularies because we are currently using a custom query which intentionally avoids retrieving relationships such as skos:hasTopConcept when rendering a vocab
- This kind of "filtering out certain predicates" mechanism needs to somehow be supported within the profiles to filter out certain predicates for container objects in a general way. This applies to things like dcat:Dataset with many relationships (dcterms:hasPart to dcat:Resource objects, for example.
- A custom query is used here for vocabs to perform incremental loading of large vocabs. We plan to generalise this query and integrate it back into the generic query that's used for /s and /c systems.

@recalcitrantsupplant please check this out and let's have a discussion on this on Monday in our design session, thanks.

recalcitrantsupplant commented 11 months ago

I ran the link generation through a debugger last night and the RDFlib query link to below is slow - it takes seconds, not sure if the performance regressed after some changes I made to it but regardless it shouldn't take that long. I switched it out for PyOxigraph and it's now in the milliseconds. I think this will resolve most of the issues.

https://github.com/RDFLib/prez/blob/c905c6e965a48912606779d41393a94794c693ae/prez/services/link_generation.py#L47

edmondchuc commented 11 months ago

We think in general, we need to have a way to filter out certain relationships that are not vital to the rendering of an object. For example, concept schemes don't need skos:hasTopConcept values to be included. If included, it will perform the expensive processing to get the prez links.
This may be solved once the new RDFrame backend is implemented. Once this is done, we should be able to tailor each object type's rendering based on the profile and the custom SPARQL query in the endpoint definition. This will be verified by @recalcitrantsupplant by testing the /object endpoint with this resource https://bgs.dev.kurrawong.ai/v/vocab/rf:Lexicon.

recalcitrantsupplant commented 9 months ago

Closing as completed

RDFLib / prez

Prez link generation is slow #178