geobtaa / geoblacklight_admin

MIT License
4 stars 2 forks source link

Database/Solr - Audit task to ensure these are sync'd appropriately #104

Open ewlarson opened 1 month ago

ewlarson commented 1 month ago

A long-standing issue with Kithe and GeoBlacklight Admin (and GEOMG beforehand), is that there are situations where you can save an object to the database, but the background process that indexes that record into Solr can fail — usually because of locn_geometry parsing issues.

We need to write a little task to audit the database entries and the solr entries and produce a diff.

Solr's /export handler looks promising: https://solr.apache.org/guide/solr/latest/query-guide/exporting-result-sets.html

This will require a Solr schema config change, to copy geomg_id_s into a docValues field. Afterwards you can run: http://localhost:8983/solr/blacklight-core/export?q=*:*&sort=geomg_id_sdv+asc&fl=geomg_id_sdv

and results will be JSON like this:

{
  "responseHeader":{"status":0},
  "response":{
    "numFound":23568,
    "docs":[{
        "geomg_id_sdv":"000894F6-E513-4D7C-BF72-1CB52D29D5B1"}
      ,{
        "geomg_id_sdv":"00090357-0df0-4e33-9bc8-aa3ce425ef09"}
      ,{
        "geomg_id_sdv":"000bf346-0aa5-40f2-8bb4-291197264a5e"}
      ,{
        "geomg_id_sdv":"00168679-2f35-4e6d-94d6-9b63bbefe685"}
      ,{
        "geomg_id_sdv":"00203c7f-b08b-46bb-a650-7c6e7925a554"}
      ,{
        "geomg_id_sdv":"00343406b1164a4690f23c307c25d679_3"}
      ,{
        "geomg_id_sdv":"0035018d-63a8-4682-95e5-d1c3d4104a7d"}
      ,{
        "geomg_id_sdv":"003a2c591c554cf3a116a113aa3c134a_0"}
      ,{
        "geomg_id_sdv":"003aa8db-4594-44bc-90f6-ebbac01d40de"}
      ,{
        "geomg_id_sdv":"003e5438-86a7-4cef-8a94-af364e25fd97"}
      ,{
        "geomg_id_sdv":"004156e4-173b-45b8-b9c4-5106a7deffbb"}

Similarly, we'll want to produce a sorted list of geomg_id_s values from PostgreSQL for comparison.

select friendlier_id
from kithe_models
where kithe_models.type = 'Document'
order by friendlier_id asc
limit 1000

Which returns...

000894F6-E513-4D7C-BF72-1CB52D29D5B1
00090357-0df0-4e33-9bc8-aa3ce425ef09
000a642584aa4b5c9485b6f17dc977a1_0
000b18d6-63b9-4314-9d4f-17c945ea09b7
000bf346-0aa5-40f2-8bb4-291197264a5e
0011D7A3-0EC0-4B1D-AF20-C055274B6DAE
00138f08-1327-4ae1-9b9e-9794140059eb
00140c37d72141eea4917e22817fe364_10
0015a391-7b23-4e4a-9b21-fff782c96e01
...