Swirrl / ook

Structural search engine
https://search-prototype.gss-data.org.uk/
Eclipse Public License 1.0
6 stars 0 forks source link

Unused codes shouldn't be selectable #45

Closed Robsteranium closed 3 years ago

Robsteranium commented 3 years ago

Two thirds of the codes don't have any observations. To avoid leading users into dead-ends we should prevent them from selecting codes with no data. These codes still serve to help users navigate so we should keep them in the index and show them in the hierarchy.

The UI should be changed so that unused codes either don't have select boxes, or have disabled select boxes.

To extract the data we can use an EXISTS clause as per this query in the ook data validation report.

We'll need to coin a new property in the ook vocab (for which we ought to serialise an ontology, I suppose) and the code index mapping will need a boolean field adding.

Robsteranium commented 3 years ago

I've added a "used" field to the code index in c1f8d19. Sadly this is a string "true" or "false". I can't seem to convince the json-ld library to honour the JSONLD 1.0 spec for type conversion. We can just parse this into a boolean for now.

kiramclean commented 3 years ago

This is implemented in the mega branch for code selection (#63), but I'm leaving it open for now since the data is suspicious. Currently it's marking all codes as used, which may be true, but worth checking out before closing this I think.

Robsteranium commented 3 years ago

Indeed the query is catch false-positive uses, sorry. [] ?p ?code could include stuff like ?parent skos:broader ?code. No wonder it terminated more quickly!

I've fixed it in cc5ba47 and then commented it out in 796f7a5 (setting used to true for now) as it runs too slowly.

Will return to this later...

Robsteranium commented 3 years ago

Having a separate codes-used-pipeline that upserts just this field with resource batches of 100 appears to work within timeouts.