Performance considerations

wasm-forge commented 3 months ago

Hello,

I was wondering about the expected performance overheads when I am using the stable BTreeMap structure.

I wanted to split a buffer of 100Mb into small 4K chunks and store those in the stable map. The performance changes radically when I try to store directly into virtual stable memory vs using the BTreeMap. I've tried three methods:

(method store_memory) Store 100Mb into memory using memory.write, costs: 124_417_565 instructions
(method store_buffer) Store 100Mb as a single element into the BTreeMap, costs: 924_560_255 instructions (9x slower that direct write)
(method store_chunks) Store 100Mb as 4K blocks into the BTreeMap, costs: 10_231_263_653 instructions (100x slower than direct write)

I was wondering if this is reasonable overhead that we have.

You can check out the branch for testing: https://github.com/wasm-forge/treemap_chunks

Regards, Stan

dsarlis commented 3 months ago

Hi Stan,

What you describe is indeed expected, i.e. a direct write into stable memory will have less overhead than using a StableBTreeMap. We have also had other people reporting an overhead between 10x - 100x as you report in your benchmarks, depending on the exact workload and access pattern.

There are some technical details involved around the design and implementation of the StableBTreeMap that explains some of this. We have some ideas on how this can be improved, e.g. the current theory is that the majority of time is spent navigating through the nodes of the BTreeMap which at the moment involves deserializing the keys stored to be able to navigate for every single access (and every node you go through). This is certainly not super efficient and we believe that with some caching of the node keys on the heap we can reduce this overhead a lot.

I don't have a precise timeline to share with you on when we will be able to address this but it's the next item on the list for stable structures work. It's also becoming important as other folks trying to use the stable structures will likely observe this overhead, so we will definitely prioritize in the short/medium term.

ufoscout commented 3 weeks ago

@dsarlis Is there any progress on this? Is there anything we could do to help with this issue?

dsarlis commented 2 weeks ago

@ufoscout Not yet but I expect that we can look into this in the near future.

ufoscout commented 2 weeks ago

@dsarlis Do you have any advice on how to mitigate this issue while waiting for a fix? We are experiencing this issue in mainnet:

Canister exceeded memory access limits: Exceeded the limit for the number of accessed pages in the stable memory in a single message execution: limit 2097152 KB for regular messages and 1048576 KB for queries..

This happens by reading/writing some thousand entries in a StableBTreeMap, btw, the size of our data is far less than the 2GB limit, so we guess that this is caused by the map that performs too many accesses

dsarlis commented 2 weeks ago

The best you can do for now would be to measure where you do these accesses (e.g. with canbench) and see if you can optimize on your side, by e.g. re-organising your data structure based on the access pattern you expect. I am not entirely sure that the work we were discussing on this issue would necessarily help with the page accesses (perhaps it would to some extent but likely also not in every case).

dfinity / stable-structures

Performance considerations #227