azgs / azlibrary_database

1 stars 1 forks source link

Remove defunct directory field form identifiers object in metadata #46

Closed aazaff closed 4 years ago

aazaff commented 4 years ago

Is this a good idea or should we rename some of these.

I think we should consolidate the "old" links - e.g., Library URL - into the identifiers object. What do you think @NoisyFlowers ?

aazaff commented 4 years ago

I am stupid... we actually already cleaned this up... it is just the actual json haven't been updated/cleaned in the database... which is my job to do. I will attempt to do this as soon as I get back.

aazaff commented 4 years ago
UPDATE metadata.azgs SET json_data = json_data - 'identifiers'

I made a copy of the metadata.azgs table and tested this. It worked fine. When I tried to apply it to the actual metadata.azgs table I got a billion WARNING: out of shared memory warnings, and a suggestion to increase max_locks_per_transaction.

I will investigate further.

aazaff commented 4 years ago

I actually we believe @snolan1 and I ran into a similar error for a previous postgres database int he past, and we learned that it was a common suggestion for postgres databases to go into the config file and increase max locks per transaction. Steve do you remember when we did this a couple years ago?

aazaff commented 4 years ago

We realized that we DO want to keep the 'identifiers' field, but we want to remove, in the long-run, the 'directory'

@NoisyFlowers also wants to investigate and verify that he is not still using that field when making new items.

NoisyFlowers commented 4 years ago

Using a copy of metadata.azgs on dev:

I created a GIN index.

create index tmp_azgs_idx on metadata.tmp_azgs using gin (json_data);

Then, I ran this update.

update
    metadata.tmp_azgs
set
    json_data = jsonb_set(
        json_data,
        '{identifiers}',
        (json_data->'identifiers') - 'directory',
        false
    )
where
    json_data->'identifiers' ? 'directory';

Note: Max locks was set to 1024, but I'm not sure if that's what it was when you tried it. I didn't change it.

The metadata looks good to me. Do you want to take a look before I try this on the actual azgs table on dev?

NoisyFlowers commented 4 years ago

Well, this is nuts.

When I ran this on a copy of metadata.azgs, I had no problem with shared memory. Now that I try to run in on metadata.azgs itself, it's blowing up with the error described.

I tried running it on a subset (metadata_id < 1000). No good.

I tried disabling all the indexes first. No good.

Based on my research, 1024 for max_locks and 100 for max_connections are pretty high values. I'm not sure increasing them will help and could lead to other side effects.

Any ideas how to proceed?

(BTW: This is all on dev.)

aazaff commented 4 years ago

That is wild... I don't know how to fix it. A few weird workarounds come to mind off the top of my head, but it might be best to keep researching and see if there is any known cause. Want to talk just about this on Monday?

NoisyFlowers commented 4 years ago

I also tried running it without the where clause. No good.

We can talk Monday afternoon, but I don't know if I'll have any more to go on. Should we bring Stephen into the discussion?

aazaff commented 4 years ago

Gary would be the best person, but I can't take him away from his project right now. Does it let you edit a single record at a time? If so, then you could try a for loop that goes one by one instead.

NoisyFlowers commented 4 years ago

I think I figured out what's happening. It's all the triggers on metadata.azgs for keeping the text search columns up to date. They each create a temporary table for loading the new json into prior to querying it. If I disable them before running the update and reenable them after, it sails through. I think this should be ok, since removing that directory field isn't relevant to any of the text searches we support. Here's the sql:

create index azgs_json_idx on metadata.azgs using gin (json_data);

alter table metadata.azgs disable trigger all;

update
    metadata.azgs
set
    json_data = jsonb_set(
        json_data,
        '{identifiers}',
        (json_data->'identifiers') - 'directory',
        false
    )
where
    json_data->'identifiers' ? 'directory';

alter table metadata.azgs enable trigger all;

I'll also add that it's probably a good idea to restart postgres before this, as there appears to be some cumulative shared memory that is then freed up. I'd probably do this after creating the index.

NoisyFlowers commented 4 years ago

This update has been made to prod. Actually yesterday (8/19/2020).