Closed aazaff closed 4 years ago
The first thing we need to do is correct the characters in the public.collection_groups table.
-- To correct bad characters in collection_group 5
update
public.collection_groups
set
collection_group_name = 'Troy L. Péwé Environmental Geology Collection'
where
collection_group_id = 5;
Next, we need to fix any collections that contain these invalid characters. Outside of the metadata in the json_data column of metadata.azgs, we only reference collection_group via fk relationships to collection_group_id in public.collections. When someone requests the collection for download, the azgs.json file is generated on-the-fly from this metadata. So, I think that the json_data column is the only place we need to fix things. I have tested the following sql locally.
-- Also correct any bad metadata that has resulted from this error
update
metadata.azgs
set
json_data = jsonb_set(
json_data,
'{collection_group, name}',
'"Troy L. Péwé Environmental Geology Collection"',
false
)
where
json_data->'collection_group'->'name' = '"Troy L. PŽwŽ Environmental Geology Collection"'
Because this sql must be applied to all collections, I want more eyes to review it before I take it beyond my local machine. Also, if anyone is aware of anywhere else these characters might be, please let me know.
Makes me wonder if there should be some kind of automated mechanism to re-generate the json column in case if we make any batch changes like this?
I'm not sure what that would look like. The json column is the canonical source of most of what it contains. We could regenerate the files list. But there's nowhere else to look for the other stuff.
I really don't like mucking around with this json column en masse like this. Makes me tense.
I went ahead and ran those sql statements on dev.
Only three records were effected. I verified each of them before and after.
This is in ed4a01cd1ddcfc5069e058a3f001a6a8e032cb10
This update is running on prod (as of 8/19/2020).
The
collection_group_name
forcollection_group
5 reads asTroy L. PŽwŽ Environmental Geology Collection
. It should read as theTroy L. Péwé Environmental Geology Collection
. I am not sure what the best procedure will be to fix this in such way that it introduces no breaking changes to our metadata.