DROP INDEX may hang for a minute if idle worker attachments hold locks in their own metadata cache

ilya071294 commented 1 month ago

Steps to reproduce are kind of similar to #7783 but they show a completely different problem.

Set parameters in firebird.conf:

MaxParallelWorkers = 8
ParallelWorkers = 8

Execute the script in isql: index_drop_hang_fb_5_0.txt

Execution hangs for a minute at the last line drop index t1_idx1. This happens because worker attachments have been creating t1_idx2 index recently, and after loading T1 table metadata they still have P1 procedure in the metadata cache. The statement of the procedure holds SR lock on t1_idx1 index. A minute later the timer thread releases worker attachments, and drop index t1_idx1 is finally completed. A simple solution is to call MET_clear_cache before putting a worker attachment into m_idleAtts array. I've tried it and it works. On the one hand, clearing the cache can be useful because in a general case the worker attachment can perform totally different tasks. On the other hand, different tasks may use the same objects. Any opinions?

hvlad commented 1 month ago

Consider also: how it is differs from user attachment's metadata cache ? why worker atts should be special in this regards ?

PS bug, really ?

ilya071294 commented 1 month ago

PS bug, really ?

Not so far from a deadlock, only the timer helps. If a script is executed when there are no concurrent user attachments, I would expect that there won't be any suspicious hang-ups. It's not critical for this script but for a larger one it can be. A typical case - recreating a bunch of indexes. The worker attachment generally is not special but its role is still different. At least it basically helps to perform a task created by the user attachment. And I wouldn't consider it right that it can impede actions in user attachments this way.

hvlad commented 1 month ago

You found a special case and going to fix it with very inefficient, but simple, way. It will not work in general. It is not necessary in most use cases. There should be a way to clear MET cache on purpose, like in DSQL statements cache. It should be used when required, not in advance as in DSQL statements cache.

PS DSQL statements cache also better to be cleared when really required, not in advance as currently, IMHO.

aafemt commented 1 month ago

Is it possible that the problem is already solved in @AlexPeshkoff's shared metadata cache branch?

ilya071294 commented 1 month ago

You found a special case and going to fix it with very inefficient, but simple, way.

I wonder how you came to this conclusion. Both parts are false actually :) I mean it wasn't me who found it (the case is practical, not theoretical) and I'm not against finding a better solution. Otherwise I wouldn't ask.

There should be a way to clear MET cache on purpose, like in DSQL statements cache. It should be used when required, not in advance as in DSQL statements cache.

I agree. Then we need to find all places that require to clear the MET cache. Currently in FB5 I see at least 2 places: delete_index and delete_relation. MET_clear_cache is called there to clear the cache of a user attachment. Will it be OK to add here similar calls for worker attachments? I guess this is also not perfect because a worker attachment has a chance to lock the index again before our lock attempt (but it should be possible only if there are other user attachments executing parallel tasks).

AlexPeshkoff commented 1 month ago

On 10/18/24 11:10, Dimitry Sibiryakov wrote:

Is it possible that the problem is already solved in @AlexPeshkoff https://github.com/AlexPeshkoff's shared metadata cache branch?

Right now not ready to give reliable answer - I'm finishing (fingers crossed) with indices, currently code is broken. As far as I understand there will not be such problem with new code but no 100% guarantees.

hvlad commented 1 month ago

Then we need to find all places that require to clear the MET cache. Currently in FB5 I see at least 2 places: delete_index and delete_relation.

Yes

MET_clear_cache is called there to clear the cache of a user attachment. Will it be OK to add here similar calls for worker attachments?

The point was to use "lock and AST" way to call MET_clear_cache in all attachments. For start (not detailed and may be not fully correct):

Jrd::Statement() and TRA_post_resources() should take lock in SR mode (allow to put objects into cache),
DFW routines mentioned above should take lock in EX mode when necessary only (signal other attachments to clear the cache and prevent new objects in cache) and convert lock back into SR when finished
lock's AST should call MET_clear_cache() and release the lock

I guess this is also not perfect because a worker attachment has a chance to lock the index again before our lock attempt (but it should be possible only if there are other user attachments executing parallel tasks).

It should not be possible with a scheme above

PS Next time you want to ask, use fb-devel, pls - it is better suited for such kind of exchange of opinions.

FirebirdSQL / firebird

DROP INDEX may hang for a minute if idle worker attachments hold locks in their own metadata cache #8287