Congyuwang / RocksDict

Python fast on-disk dictionary / RocksDB & SpeeDB Python binding
https://congyuwang.github.io/RocksDict/rocksdict.html
MIT License
176 stars 8 forks source link

Exception: Result incomplete: Write stall #51

Closed MichelangeloConserva closed 1 year ago

MichelangeloConserva commented 1 year ago

Hello,

the db.write with a WriteBatch object is throwing the "Exception: Result incomplete: Write stall" error. I'm having a hard time debugging the issue.

My situation involves a process (gatherer) that generates key-value pairs and puts them in a Queue. Another process (writer) that gets a batch of key-value pairs from the Queue, checks whether the keys are already in the database, and if they are not then it puts a WriteBatch object. Once the entire batch has been process I call the db.write method.

Do you have any suggestion of how I can investigate this issue?

Thanks, Michelangelo

MichelangeloConserva commented 1 year ago

From further analysis, it seems like the issue is related to the fact that the databased writing was handled inside a sub-process. The issue doesn't appear if I move the database writing in the main process.

Congyuwang commented 1 year ago

Is the batch size and everything else totally identical in the two cases (sub-process vs main process)? Emm, curious.

Congyuwang commented 1 year ago

Write stall may suggest that each write batch is too large, but I am unsure about this too.

Congyuwang commented 1 year ago

There are several reasons that might cause a write stall.

Maybe this link is helpful too: https://github.com/EighteenZi/rocksdb_wiki/blob/master/RocksDB-Tuning-Guide.md#flushing-options.

In your case, I conjecture that maybe your write batch fills up all the memtable and causes write to stop. So, you might try to increase the write_buffer_size or max_write_buffer_number and see if it helps.

Congyuwang commented 1 year ago

Can you provide a minimal example? So I may try to reproduce it.

MichelangeloConserva commented 1 year ago

I've been unsuccessfully trying to reproduce the issue on a simplified version of the code. It seems like you conjecture is correct. It's rather caused by some problem with the options of the db rather than with the multiprocessing.

Further, I was spawning processes from an IPython console. This may also be part of the issue.

Thank you very much!