bcgov / ckan-ui

CKAN UI - VueJS
GNU Affero General Public License v3.0
14 stars 4 forks source link

SOLR not Re-indexing when Org Name changes (DDS-1069) #505

Open annikaLiving opened 3 years ago

annikaLiving commented 3 years ago

SOLR needs to be configured so that when an org name changes, it does a re-index of those datasets.

An org name was changed, the following do and don't show it

  1. Old name on the list of datasets when searching
  2. New name on the the dataset/package change
  3. Old name results with zero records associated with it
  4. New name results with zero records associated with it
  5. New name shows up in the Org page list

Change was made over 24 hrs ago and this is still the state.

Changes but will be over written with new data import. Old: data.gov.bc.ca/organization/labour-market-insights-evaluation-and-outreach New:

Labour Market Analytics, Forecasting & Information data.gov.bc.ca/organization/labour-market-analytics-forecasting-and-information

joe-taylor commented 3 years ago

I believe that is what this core CKAN ticket is about: ckan/ckan#5842. There is a pull request marked WIP for version 2.8. Perhaps when that's complete we can backport it to 2.7; otherwise, we can assess how often organization names change and whether a different catalogue indexing strategy could serve as sufficient, e.g. a nightly full solr reindex.

joe-taylor commented 3 years ago

It may be the case that nightly reindexes are already being done in production; I recommend revisiting post launch.

annikaLiving commented 3 years ago

A current example of a package did and did not change to the new org name while other packages did.

Business Management Services was renamed Transformation Services at least a week ago from when i am commenting.

image Historical DriveBC Events https://catalogue.data.gov.bc.ca/dataset/cdf6ab31-fa03-479a-b6e0-f9a0c71edf91

joe-taylor commented 3 years ago

That particular record was last indexed by Solr around April 28th, whereas all other records in that organization were indexed September 21st, fractions of a second apart, at 4:10 UTC. Only 13 records were last indexed September 21st, so a full site reindex is not indicated.

It is possible that something in our legacy code is configured to automatically reindex datasets when an organization name changes. It's also possible a script triggered the reindex, or that the records were somehow reindexed by hand, such as by simultaneously clicking a save button somewhere.

annikaLiving commented 3 years ago

Sept 21 is in line with when the org name changed. I know of one actions that forces a package to reindex, changing the state will do so but it then notifies all editors (not something we really want). not sure why this one wouldn't be triggered when all the others did.

annikaLiving commented 3 years ago

Oct 28, 2021 Renamed this following and indicating no resources. interested to see what does trigger the number of resources to be listed. https://catalogue.data.gov.bc.ca/organization/2c6528e4-194e-4c8e-bca6-91d358943126

joe-taylor commented 3 years ago

The number is updated overnight as part of our old but new again nightly search reindex. More could be done to improve the organization pages such that a reindex is not required on rebuild but for now that should help at least.