RDFLib / prez

Prez is a data-configurable Linked Data API framework that delivers profiles of Knowledge Graph data according to the Content Negotiation by Profile standard.
BSD 3-Clause "New" or "Revised" License
18 stars 7 forks source link

Prez link generation is slow #178

Closed edmondchuc closed 4 months ago

edmondchuc commented 7 months ago

Currently, the /object endpoint takes an IRI and loads an object. It's slow because it does an N+1 database query for each IRI it finds within the resource to generate the prez links. So for a concept scheme with hundreds of top concepts, it will query the database for each one and attempt to generate a prez link. As you have mentioned in the past, this is the naive solution because it's slow but correct.

Is there any way where we can use some of the info in the profiles to explicitly ignore relationships we don't want included during prez link generation? This kind of handling is similar to the custom queries we have now to handle the VocPrez endpoints for the progressive loading of large vocabs because we needed to ensure we were only loading the proximate information that's needed to render the UI and not a description of the entire vocab (like a SPARQL DESCRIBE query would).

I'm raising this because in the BGS case, they have very large vocabularies with hundreds or thousands of concepts and a render on the /object endpoint is extremely slow. For example, https://data-uat.bgs.ac.uk/object?uri=http://data.bgs.ac.uk/ref/Geochronology. If we can somehow instruct Prez to not include certain properties on routes like skos:hasTopConcept/skos:topConceptOf, then the loading of this vocab on the /object endpoint will be very fast, I think.

edmondchuc commented 7 months ago

Related issue: https://github.com/RDFLib/prez/issues/166

hjohns commented 7 months ago

Expect when caching is introduced, it may help. For large vocabs, the initial request will still have issues. Conditional prez-link generation may be needed. Issue potentially where there is a large number of top concepts. Further investigation required.

hjohns commented 7 months ago

Solved in the current set of changes by using a simple cache, however the initial request load time is not improved.

Other options to discuss:

Meeting to be scheduled for next Monday.

edmondchuc commented 7 months ago

My current understanding:

@recalcitrantsupplant please check this out and let's have a discussion on this on Monday in our design session, thanks.

recalcitrantsupplant commented 7 months ago

I ran the link generation through a debugger last night and the RDFlib query link to below is slow - it takes seconds, not sure if the performance regressed after some changes I made to it but regardless it shouldn't take that long. I switched it out for PyOxigraph and it's now in the milliseconds. I think this will resolve most of the issues.

https://github.com/RDFLib/prez/blob/c905c6e965a48912606779d41393a94794c693ae/prez/services/link_generation.py#L47

edmondchuc commented 7 months ago
recalcitrantsupplant commented 4 months ago

Closing as completed