OpenAtomFoundation / pikiwidb

a high-performance, large-capacity, multi-tenant, data-persistent, strong data consistency based on raft, Redis-compatible elastic KV data storage system based on RocksDB
BSD 3-Clause "New" or "Revised" License
194 stars 63 forks source link

bug: exit pikiwidb with error #293

Closed dingxiaoshuai123 closed 3 months ago

dingxiaoshuai123 commented 4 months ago

Is this a regression?

Yes

Description

使用 ctrl + c 退出后, 会有错误. image

Please provide a link to a minimal reproduction of the bug

No response

Screenshots or videos

images

Please provide the version you discovered this bug in (check about page for version information)

No response

Anything else?

No response

AlexStocks commented 4 months ago

@Tangruilin

Issues-translate-bot commented 4 months ago

Bot detected the issue body's language is not English, translate it automatically.


@tangRuilin

Tangruilin commented 4 months ago

/assigned

Tangruilin commented 3 months ago

feat: transform kv and hash command using blackwidow without cache (#101)

Tangruilin commented 3 months ago

feat: transform kv and hash command using blackwidow without cache (#101)

确认了是这个 patch 引入的

Tangruilin commented 3 months ago

image 我逐个修改文件确认了一下 失败是因为从 leveldb 切换到了 rocksdb,这里吧 dump 文件的逻辑去掉了

目前怀疑是退出的时候 rockdb 没有正常关闭,然后 pikidb 里面开了多个 rocksdb 示例,rocksdb 在退出的时候失败了

Tangruilin commented 3 months ago

image 我逐个修改文件确认了一下 失败是因为从 leveldb 切换到了 rocksdb,这里吧 dump 文件的逻辑去掉了

目前怀疑是退出的时候 rockdb 没有正常关闭,然后 pikidb 里面开了多个 rocksdb 示例,rocksdb 在退出的时候失败了

store.h store.cc

引入bug 的是这两个文件

Issues-translate-bot commented 3 months ago

Bot detected the issue body's language is not English, translate it automatically.


![image](https://private-user-images.githubusercontent.com/23651891/330729367-1466cf83-75f2-4919-8480-8b63e94f134c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..sTf 3Rqx_AHAsNkqISC1jzandnhhbsjckGIQDU4SUJFg) I modified the files one by one to confirm The failure is due to switching from leveldb to rocksdb. The logic of the dump file is removed here.

It is currently suspected that rockdb did not close properly when exiting, and then multiple rocksdb examples were opened in pikidb. Rocksdb failed when exiting.

store.h store.cc

It is these two files that introduce bugs

Tangruilin commented 3 months ago

image 我逐个修改文件确认了一下 失败是因为从 leveldb 切换到了 rocksdb,这里吧 dump 文件的逻辑去掉了 目前怀疑是退出的时候 rockdb 没有正常关闭,然后 pikidb 里面开了多个 rocksdb 示例,rocksdb 在退出的时候失败了

store.h store.cc

引入bug 的是这两个文件

https://github.com/facebook/rocksdb/issues/11349

一个怀疑。

rocksdb 跑在子线程,线程退出的时候没有对 rocksdb 做 dump,pikiwidb 退出以后,rocksdb 的 database 被销毁了,但是数据没有 dump 完。所有报了这个错

Issues-translate-bot commented 3 months ago

Bot detected the issue body's language is not English, translate it automatically.


![image](https://private-user-images.githubusercontent.com/23651891/330729367-1466cf83-75f2-4919-8480-8b63e94f134c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..sT f3Rqx_AHAsNkqISC1jzandnhhbsjckGIQDU4SUJFg) I modified the files one by one and confirmed The failure was caused by switching from leveldb to rocksdb. The logic of the dump file was removed here. It is currently suspected that rockdb was not closed properly when exiting, and then multiple rocksdb examples were opened in pikidb. Rocksdb failed when exiting.

store.h store.cc

It is these two files that introduce bugs

https://github.com/facebook/rocksdb/issues/11349

A doubt.

rocksdb runs in a child thread. When the thread exits, rocksdb is not dumped. After pikiwidb exits, the rocksdb database is destroyed, but the data is not dumped. Everyone reported this error

Tangruilin commented 3 months ago

image 可以看到 redis 类的析构函数没有跑到

这个析构函数里面对 rocksdb 的实例进行了退出处理

panlei-coder commented 3 months ago

core dump了一下,发现在rocksdb::DBImpl::CancelAllBackgroundWork获取的锁已经被销毁了 710b232512610cf573bfcd67d4ccf2f 792857b61e7a329254d39bb2eb0585e

panlei-coder commented 3 months ago

1715846998156 gdb打断点复现了一下,看上面的提示信息,似乎是在主线程函数退出的时候,调用了这个rocksdb::Timer::~Timer的销毁函数(里面触发了rocksdb::port::Mutex::~Mutex的调用) 但是有一点比较奇怪,就是这个销毁的函数先于Redis::~Redis函数调用,但在进入了Redis::~Redis之后,连续触发了两次获取锁都成功了,在接着的第三次获取锁失败了(锁被销毁了),这中间没有触发过锁销毁的操作 70b182ceecc5fc7c36887878bfb3b9e

panlei-coder commented 3 months ago

单独把CancelAllBackgroundWork操作放在了close操作里面,PikiwiDB::Run()函数退出之前调用了一下,跑了一下可以正常退出了 9c55f042d5516e69b719e88c932ab50 9f840c6250c23d17c291a8aafe0b828 6a6acad90ad2961f5a292b0ce9c245c

panlei-coder commented 3 months ago

侯盛鑫:可能是这个issue描述的bug, https://github.com/facebook/rocksdb/issues/11440

Issues-translate-bot commented 3 months ago

Bot detected the issue body's language is not English, translate it automatically.


Hou Shengxin: https://github.com/facebook/rocksdb/issues/11440, it may be the bug described in this issue