Closed diegosalvi closed 8 months ago
As we have a trace for the deadlock, would it be possible to add a test case the reproduces the problem ?
@eolivelli it had my first concern but I haven't been able do reproduce in a test 'till now
Do not integrate! I found some strange behaviours while attempting to reproduce the deadlock.
Added a testcase that reproduced the deadlock
Now can be merged
The patch is not trivial, I will take a look this week
Thanks
@eolivelli Yes I had to change many things to make it working. There were many bugs in page and lock handling. Biggest changes are:
it would be great to run the long running performance tests with this change before cutting a release. nasty things may happen in the long run
I did a first quick look.
I am surprised that we are touching so many method but we are adding only one test case. Is there any corner case that we are touching that is not covered by tests ?
I am asking because we are touching a critical part of the system
There is only one test case to force the old deadlock situation. Other cases are covered by existing tests (and failed many times during development). The biggest change, the checkpoint order change, is covered by a whole test suite about chekpointing (and testing metadata ordering and produced tree) and restoring/restarting.
@dmercuriali @aluccaroni @diegosalvi we can merge this patch. My point is about not cutting a release before running appropriate load testing.
Currently I'm running a test instance with some load, let's see in some days if it doesn't result in some errors
With @hamadodene we have run a working instance for a week with load without any issue.
Great work!
This PR fixes #820 Changes load/unload BRIN pages deferring unload until the unlock of currently handled BRIN block, doing so we don't keep current block locked until other block has been fully unload (that needs a lock too). Changed block loading during blocks merge too: now we don't load in the page memory blocks that aren't needed anymore because the will deleted at merge end. The neat result is that there is one more page in thread memory that will discarded/cleaned at the end.
Following this checklist to help us incorporate your contribution quickly and easily:
mvn clean verify
to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.To make clear that you license your contribution under the Apache License Version 2.0, January 2004 you have to acknowledge this by using the following check-box.