apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.57k stars 1.32k forks source link

SQLite in StorageServer deadlocked after the node was disconnected and resumed. #11578

Open DuanChangfeng0708 opened 3 months ago

DuanChangfeng0708 commented 3 months ago

My 3-node 3-duplicates fdb cluster. version:7.1.27 The operation steps are: Node A is disconnected from the network for 10 minutes and restored. During the data moving, Node B is disconnected from the network for ten minutes and then restored. The storageServer of Node B has an infinite loop. The result of pstack is as follows: pstack

The backtrace result in the printed Net2RunLoopTrace is as follows: 20240815-172850

I gdb went in and found that I couldn't get a lock from SQLite。I gdb went in and found that I couldn't get a lock from SQLite. Then it tried again indefinitely. the code is https://github.com/apple/foundationdb/blob/main/contrib/sqlite/sqlite3.amalgamation.c#L37717 screenshot-20240815-173116

the rc is SQLITE_BUSY the lockIdx is 4 and the n is 4

DuanChangfeng0708 commented 3 months ago

My 3-node 3-duplicates fdb cluster. version:7.1.27 The operation steps are: Node A is disconnected from the network for 10 minutes and restored. During the data moving, Node B is disconnected from the network for ten minutes and then restored. The storageServer of Node B has an infinite loop. The result of pstack is as follows: pstack

The backtrace result in the printed Net2RunLoopTrace is as follows: 20240815-172850

I gdb went in and found that I couldn't get a lock from SQLite。I gdb went in and found that I couldn't get a lock from SQLite. Then it tried again indefinitely. the code is https://github.com/apple/foundationdb/blob/main/contrib/sqlite/sqlite3.amalgamation.c#L37717 screenshot-20240815-173116

the rc is SQLITE_BUSY the lockIdx is 4 and the n is 4

my cpu: HUAWEI Kunpeng 920 5220 my OS: openEuler 22.03

giorgiozoppi commented 3 months ago

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

DuanChangfeng0708 commented 3 months ago

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

Sorry, I didn't understand what you were trying to express. Are you trying to express that this issue was introduced by SQLite?

giorgiozoppi commented 3 months ago

Yes, we tried at work to use it for a PersistentQueue and we had a lot of headache and move to rocksdb.