Expensify / Bedrock

Rock solid distributed database specializing in active/active automatic failover and WAN replication
https://bedrockdb.com
GNU Lesser General Public License v3.0
1.08k stars 82 forks source link

Time out SUBSCRIBE query instead of forking. #1783

Closed tylerkaraszewski closed 2 months ago

tylerkaraszewski commented 2 months ago

Details

See issue, it's pretty detailed.

Fixed Issues

Fixes https://github.com/Expensify/Expensify/issues/406697

Tests

Testing on commit e1460937 (before rolling back the test hacks):

We successfully time out the query:

2024-06-21T19:53:05.538431+00:00 expensidev2004 bedrock10001: xxxxxx (libstuff.cpp:151) SException [sync] [info] Throwing exception with message: 'synchronization query timeout' from sqlitecluster/SQLiteNode.cpp:2154

This kills the peer connection:

2024-06-21T19:53:05.538542+00:00 expensidev2004 bedrock10001: xxxxxx (SQLiteNode.cpp:1785) _onMESSAGE [sync] [warn] {cluster_node_0/LEADING} ->{cluster_node_2} Error processing message 'SUBSCRIBE' (synchronization query timeout), reconnecting.

Follower reconnects, switches to searching:

2024-06-21T19:53:05.538636+00:00 expensidev2004 bedrock10007: xxxxxx (SQLitePeer.cpp:90) postPoll [sync] [hmmm] {cluster_node_0} Lost peer connection after 101ms, reconnecting in 739ms
2024-06-21T19:53:05.538800+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1970) _changeState [sync] [info] {cluster_node_2/SUBSCRIBING} Switching from 'SUBSCRIBING' to 'SEARCHING'

Follower runs through the state machine, succeeds on second try:

2024-06-21T19:53:05.578719+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1970) _changeState [sync] [info] {cluster_node_2/SEARCHING} Switching from 'SEARCHING' to 'SYNCHRONIZING'
2024-06-21T19:53:05.582020+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1970) _changeState [sync] [info] {cluster_node_2/SYNCHRONIZING} Switching from 'SYNCHRONIZING' to 'WAITING'
2024-06-21T19:53:06.273311+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1970) _changeState [sync] [info] {cluster_node_2/WAITING} Switching from 'WAITING' to 'SUBSCRIBING'
2024-06-21T19:53:06.291824+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1970) _changeState [sync] [info] {cluster_node_2/SUBSCRIBING} Switching from 'SUBSCRIBING' to 'FOLLOWING'

Auth built and tested against this as well.


Internal Testing Reminder: when changing bedrock, please compile auth against your new changes

tylerkaraszewski commented 2 months ago

@iwiznia @cead22 updated!

I haven't re-run the tests yet (which are fairly manual). I will do that before merging, but feel free to re-review/approve.

tylerkaraszewski commented 2 months ago

Updated testing on the latest change showing the same behavior as the original testing:


2024-06-24T18:43:54.249843+00:00 expensidev2004 bedrock10001: xxxxxx (libstuff.cpp:151) SException [sync] [info] Throwing exception with message: 'synchronization query timeout' from sqlitecluster/SQLiteNode.cpp:2153
2024-06-24T18:43:54.250186+00:00 expensidev2004 bedrock10001: xxxxxx (SQLiteNode.cpp:1788) _onMESSAGE [sync] [warn] {cluster_node_0/LEADING} ->{cluster_node_2} Error processing message 'SUBSCRIBE' (synchronization query timeout), reconnecting.
2024-06-24T18:43:54.250287+00:00 expensidev2004 bedrock10007: xxxxxx (SQLitePeer.cpp:90) postPoll [sync] [hmmm] {cluster_node_0} Lost peer connection after 3ms, reconnecting in 629ms
2024-06-24T18:43:54.250333+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1973) _changeState [sync] [info] {cluster_node_2/SUBSCRIBING} Switching from 'SUBSCRIBING' to 'SEARCHING'
2024-06-24T18:43:54.250513+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1973) _changeState [sync] [info] {cluster_node_2/SEARCHING} Switching from 'SEARCHING' to 'WAITING'
2024-06-24T18:43:54.728064+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1973) _changeState [sync] [info] {cluster_node_2/WAITING} Switching from 'WAITING' to 'SUBSCRIBING'
2024-06-24T18:43:54.750226+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1973) _changeState [sync] [info] {cluster_node_2/SUBSCRIBING} Switching from 'SUBSCRIBING' to 'FOLLOWING'