alphagov / datagovuk_find

Beta version of Find Data
12 stars 9 forks source link

[EPIC] Add Solr gem to DGU Find #1293

Open kentsanggds opened 2 months ago

kentsanggds commented 2 months ago

Add Solr Gem to find as a feature so can switch between Opensearch and Solr

### Tasks
- [ ] Investigate indexing Publisher, Format and Topics in ckanext-datagovuk to be run as a cronjob
### Search options
- [x] User input - to check sanitization etc
- [x] Publisher
- [x] Topic
- [x] Format
- [x] OGL
- [x] Results ranking - best match
- [x] Results ranking - most recent
- [ ] Add tests
### Search results page
- [x] Title
- [x] Organisation
- [x] Last updated
- [x] Summary
- [x] Filter by publisher (currently returns "organizations" slugs taken from CKAN API /organization_list endpoint, to look into Solr facet and aggregation for distinct values
- [x] Filter by topic (currently hardcoded as this seems definitive)
- [x] Filter by format (currently hardcoded as this seems definitive)
- [x] Filter by OGL
- [x] Clicking on a result link goes to dataset page (to point to /solr)
- [x] Availability (shows "not released" if dataset does not have datafiles)
- [ ] Search results pagination
- [x] Sort results by best match (default sort option)
- [x] Sort results by most recent - sort using public_updated_at and order desc
- [ ] Update Publisher to use solr index
- [ ] Update Topic to use solr index
- [ ] Update Format to use solr index
### Individual dataset page
- [x] Breadcrumb - to add /search and look into @referer_query and @referer
- [x] Header Metadata
- [x] Metadata box - availability, license_id(unpublished,__other__,ogl,other-at should display text "None"), topic (currently slug) left
- [x] Link to show more datasets from publisher
- [x] Search box
- [x] Get data from "location" field and how it is used in the view(?) - to confirm it's not being used
- [x] List data links (non time series files) if present
- [ ] List data links (timeseries data files?) if present - groups files by year, haven't found example of this yet. Looks like it was used for legacy datasets so we won't implement this for now (to add to a decision log)
- [x] Show supporting docs if available
- [x] Show additional information if present
- [ ] Show contact details if publisher has contact email - inconsistent data for contact-email and foi-email, manually passed to organisation model?
- [x] Show licence information (review raw custom licence / other )
- [x] Show option to login to edit CKAN dataset if has not been harvested
- [ ] Return a 404 if the dataset is empty or does not exist
- [ ] Option to get dataset by legacy name?
- [ ] Show related datasets if present - part of search functionality, see https://solr.apache.org/guide/8_11/morelikethis.html
- [ ] Add tests
- [ ] Datafile (datafile["created"]) is not always available
### Pull requests
- [ ] https://github.com/alphagov/datagovuk_find/pull/1313
- [ ] https://github.com/alphagov/govuk-dgu-charts/pull/309
- [ ] https://github.com/alphagov/datagovuk_find/pull/1314
- [ ] https://github.com/alphagov/datagovuk_find/pull/1315
- [ ] https://github.com/alphagov/datagovuk_find/tree/add-pagination-controls
### GH Issues
- [ ] https://github.com/alphagov/datagovuk_find/issues/1318
- [ ] https://github.com/alphagov/datagovuk_find/issues/1316
- [ ] https://github.com/alphagov/datagovuk_find/issues/1317
kentsanggds commented 3 weeks ago

We had a chat about how to proceed with progressing and reviewing the work in smaller chunks.

Proposal is to create another branch main-solr from which other solr update branches will be fed into. This will ensure that smaller PRs can be reviewed and merged into it without affecting any deployments to production.

The integration cluster can be updated to point to commits on this branch specifically when there is a need to deploy things for testing on EKS.

As there is generally very little development on the Find application I don't forsee any issues around the branch drifting from the main branch.