Swirrl / ook

Structural search engine
https://search-prototype.gss-data.org.uk/
Eclipse Public License 1.0
6 stars 0 forks source link

Large codelists load too slowly #77

Closed Robsteranium closed 3 years ago

Robsteranium commented 3 years ago

Paraphrasing Kira...

The incremental loading of codelists does help but there are still individual codelists that are too big to load all at once.

We can reduce the load by fetching each individual branch on demand.

This adds complexity in situations where the expansion/selection state does not come from user events (e.g. from a permalink or a search).

There’s a decent chunk of work to do to make a real drilldown UI.

Robsteranium commented 3 years ago

Perhaps having a ready-made tree would accelerate retrieval #49. This isn't an either/or choice - we can have two indices for codes, one where each doc is a codelist (tree) and another where each doc is a code (as now).

kiramclean commented 3 years ago

This could be useful for sure, assuming retrieving a given tree is fast. So far I've started breaking down the queries to fetch trees into more granular steps, to retrieve each level on demand rather than fetching the whole tree at once the first time the codelist is expanded. This keeps each query fast and should help reduce the amount of stuff we keep in memory, too, but if there's a simple way to just fetch a whole tree at once that could be useful, too.

Robsteranium commented 3 years ago

I've revised the build-concept-tree method to load all the scheme's codes into memory up front it. This is passed as a lookup to build-sub-tree obviating a database call per parent.

The approach looks to be at least an order of magnitude quicker for the big trees (e.g. NUTS 2016).

I noticed along the way that the NUTS 2016 scheme was failing the spec as some children were missing from the lookup. This can happen because the index stores the union of (skos:narrower) across all schemes. A given child code might not be in the current scheme (as it was retired for example). I've ignored any child-uris that aren't in the current scheme.

This patch also appears to also have resolved #104. This may be because the response is quicker or because missing children are removed.

Robsteranium commented 3 years ago

Resolved with #107.