Open jsteemann opened 2 years ago
Cc @anand1976
This bug is still causing us a lot of operational issues and a lot of "unnecessary" support cases, e.g.
Is there any way forward to get out of this bad situation at some point? I would love to see this fixed or at least mitigated somehow. Thanks!
Any idea if/when this issue can be investigated or even fixed? Thanks!
@jsteemann Sorry for the delay! I'll get back to you early next week.
@anand1976 Is there any progress on this? It is bothering us a lot with customer databases in ArangoDB.
The error recovery behavior when 2pc is enabled seems overly restrictive - https://github.com/facebook/rocksdb/blob/main/db/error_handler.cc#L513. For the short-term, we could relax it so certain types of errors, such as errors when writing an SST file or MANIFEST can be recovered from.
For WAL write errors when allow_2pc is true, recovery may be a bit more tricky. Cc @riversand963 for his opinion on how it should be handled. How should we handle transactions with prepare records in the WAL, but haven't committed yet and we cannot guarantee durability of the WAL?
Hi everyone, it has been a while, but the issue is still unresolved/uncommented. We would really appreciate if this could be addressed for one of the upcoming RocksDB releases. Thanks!
Hey guys circling back on this issue, it's just a painful and fixable issue I would think. Is there any planned resolution and ETA on this?
Hi everyone, is there anything that can be done about this issue? It is still causing us lots of problems that auto-recovery does not work with PessimisticTransactionDB even in WRITE_COMMITTED
mode.
Thanks!
Faced same issue, two years and no solution
This issue keeps reoccurring. Is there an update when it can be investigated and hopefully resolved? Thank you!
It looks to me that RocksDB's auto-recovery functionality after htting "no space left on device" does not work with the PessimisticTransactionDB and write policy
WRITE_COMMITTED
. When the disk runs full and RocksDB gets an error on filesystem operation, it tracks that there has been a background error and makes all subsequent write operations fail. That is fine so far. However, RocksDB stays in background error mode forever, even after space is made available on the underlying filesystem. Even callingResume()
on the db object manually does not fix the problem. I have observed this issue in many different versions of RocksDB, including the very latest state ofmain
(which recently had some fixes for auto-recovery). I am very sure that this is not a filesystem issue, as I have observed the problem on many filesystems over years.It seems to me that the only possible way to convince RocksDB to reset the background error is to remove the
allow_2pc
flag from the options. With that change made, auto-recovery actually works fine. Unfortunately theallow_2pc
flag is hard-coded when using PessimisticTransactionDB, as https://github.com/facebook/rocksdb/blob/main/utilities/transactions/pessimistic_transaction_db.cc#L293 unconditionally sets the flag totrue
.Expected behavior
RocksDB should auto-recover after space has been made available in the underlying filesystem.
Actual behavior
It doesn't. Even calling
Resume()
on the db object manually does not help.Steps to reproduce the behavior
The issue can be easily reproduced in many versions of RocksDB, including the very latest state of
main
. To test, I created a 2GB local tempfs mount and used it as RocksDB's data directory. I also created a few files containing garbage in that directory, that later can be deleted after RocksDB reports the ENOSPC error. That way it can be tested easily if RocksDB comes back after space has been made available again:After that, start up RocksDB and use a PessimisticTransactionDB and write policy
WRITE_COMMITTED
. Write as much data into RocksDB that the 2GB tempfs directory runs full and RocksDB starts reporting ENOSPC errors. Once that has happened, remove the filestmpfs-dir/garbage1
totmpfs-dir/garbage2
. Check thatdu -hs tmpfs-dir
correctly reports the free space, but RocksDB actually keeps reporting the ENOSPC error forever.Auto-recovery can be fixed by applying the following patch, which removes the hard-coding of the
allow_2pc
flag:I don't know if using that patch is actually safe (probably it isn't). For now I was just interested in what caused the problem.
Having a working auto-recovery with the PessimisticTransactionDB would be great, because it is very confusing to see RocksDB report ENOSPC errors when there is actually a lot of free disk space. The current behavior has caused lots of operational issues for us over the past few years, so getting rid of it would be a great step forward!