Closed koldat closed 5 years ago
Just curious, will fsync before ingestion fix the issue for NFS? i.e. https://github.com/facebook/rocksdb/commit/2730fe693edf306aad11a48491cfe3be4c178a47
Thanks @yiwu-arbug it is very nice change! I have merged it into our production build. Hopefully will help. We are using SSTFileWriters which are flushing at and. To be sure I have added use_fsync=true for writer. I will keep you posted.
Your change is maybe fixing root cause, but the crash is happening because iterators are not properly used in subsequent parts of RocksDB. It took me a while to retrieve crash dump from production (we are running in k8s) and compile RocksJNI with separated debug symbols. I think it will help many people. Crash is the worst that can happen.
There is an issue that under some conditions ingest file tries to erase uninitialized iterator:
Why it happens? On some NFS databases we are hitting some OS race condition where ingest move file succeeds, but compaction job that is started immediately does not see that file yet. It looks like this:
You can see file was ingested, but then it is not readable. Compaction fails with IO_ERROR which is mapped as kFatalError with paranoid checks. Error handler then stops DB which is still expected. But next call to ingest does:
Thus not initialized iterator thus crash. It should not crash even if we switch to read only database.
Expected behavior
Process should not crash.
Actual behavior
Crash callstack:
Steps to reproduce the behavior
I was not able to reproduce on local drive. It happens once a day on NFS. We are heavily using ingest file. When it causes compaction there is high chance to make this happen.
I have a fix and will create pull request.