alphagov / datagovuk_find

Beta version of Find Data
12 stars 9 forks source link

Format filter should only include valid formats for datafiles of datasets that are included in the search results #1336

Closed deborahchua closed 1 week ago

deborahchua commented 1 month ago

If a user has entered a search term and/or has selected some filters, we should expect the formats filter to only include formats of datafiles of datasets that are included in the search results.

We can get this information by including the use of faceting in the Solr query. We can specify the res_format field as a facet, and only return any where the minimum count is 1.

The facet data returned includes a number of formats that aren't valid or display friendly (users are able to add this manually/part of a data import). We need to clean up this data so that it maps to valid formats, and enforce some validation so that users can't add custom formats in future.

Screenshot 2024-10-28 at 15 21 45

This query lists all available formats and a count for each in the Solr UI - /solr/#/ckan/query?q=*:*&q.op=OR&indent=true&facet=true&facet.field=res_format&facet.sort=count&facet.mincount=1&fl=res_format&rows=1 Screenshot 2024-10-28 at 15 25 19

See WIP - https://github.com/alphagov/datagovuk_find/commit/1f9ee3c1e7db347c3bcf5f4ed1cb831016b7f9aa on branchtopic-format-filters

For more information on faceting see https://solr.apache.org/guide/8_11/faceting.html

deborahchua commented 1 month ago

We agreed that the best workaround to start with is to map anything that doesn't fall under the default list as other - we can always review the list and do a cleanup of the formats later.