Congyuwang / RocksDict

Python fast on-disk dictionary / RocksDB & SpeeDB Python binding
https://congyuwang.github.io/RocksDict/rocksdict.html
MIT License
176 stars 8 forks source link

Invalid argument: value is too large #65

Closed sepine closed 1 year ago

sepine commented 1 year ago

Hi,

When I save a large data, about 80G, I got an error, Invalid argument: value is too large, how can I address this?

Thanks for your help!

@Congyuwang @JiangDonglai98

Congyuwang commented 1 year ago

Is that 80GB the size of a single value?

sepine commented 1 year ago

Yes, I try to serialize two dicts, each of which holds about 40G data, and when I put each part in separate keys, I will got this error.

sepine commented 1 year ago

Just use the 'put' method

Congyuwang commented 1 year ago

Can you try to print out which key-value pair in particular causes this error? This might give us a clue about the cause.

Congyuwang commented 1 year ago

Do you mean you are storing them in 2 keys or storing using the inner key-value of the dict?

Congyuwang commented 1 year ago

If you are storing the 2 dicts in only 2 keys, it will probably cause error. I’d recommend you to store the dicts using the inner keys of the dicts into rocksdict (which is better at storing smaller key-value pairs).

sepine commented 1 year ago

I define two key-value pair in my code, all_infos ->(a serialized json data) latest_infos -> (a serialized json data)

I iteratively update these to values in these to keys using the new data. Then I will store the updated results into db, using put (all_info, ) and put(latest_infos -> )

But I got the above error.

sepine commented 1 year ago

You mean I store the data, using this format, like

key: {k1: {k2: v}}

I store as (key_k1_k2, v)

right?

Congyuwang commented 1 year ago

Yes, key_k1_k2 would work better. Plus you would be able to query key_k1_k2 immediately without reading and loading the large all_infos. You can also use column families and store the two dicts in two different column families, so you won't need the all_infos prefix.

sepine commented 1 year ago

Many thanks for your help, making me more familiar with KV database.

Thank you!