CosmWasm / cosmwasm

Framework for building smart contracts in Wasm for the Cosmos SDK
https://www.cosmwasm.com/
Apache License 2.0
1.06k stars 329 forks source link

Retry database read when allocated memory is too small #126

Closed webmaster128 closed 4 years ago

webmaster128 commented 4 years ago

See https://github.com/confio/cosmwasm/pull/125#discussion_r369549813 and https://github.com/confio/cosmwasm/blob/v0.6.3/src/imports.rs#L54-L60.

125 will return an error code specifically for the case memory too small.

reuvenpo commented 4 years ago

Instead of pre-allocating the buffer and then returning an error when it is too small, we can instead expose a function on the rust side that accepts a pointer to go memory and a length:

pub extern "C" fn rust_write_buffer(ptr: *const u8, length: usize) -> Buffer { ... }

This function will treat the pointer and length as a slice, copy it into a vector, and then turn it into the existing Buffer type. Then, when reading the DB on the Go side, we can call this function before returning from Go, and hand back the Buffer instance to the Rust function that requested the call. Then, the Rust function can safely convert this Buffer into a Vec<u8> which will contain the full value.

This technique makes the read operation completely infallible, and returns an arbitrarily sized buffer in one round trip :)

webmaster128 commented 4 years ago

@reuvenpo do I understand correctly that you're talking about the Go/cosmwasm-vm interface from https://github.com/CosmWasm/go-cosmwasm? If so, can you create a ticket there? The approach makes sense at first glance.

This ticket is about the VM/contract boundary. In this case I don't think the approach would work as it would require that the Wasm contract gets access to the VM's memory, which would break the sandboxed execution.

webmaster128 commented 4 years ago

Moved to Go/cosmwasm-vm discussion to https://github.com/CosmWasm/go-cosmwasm/issues/59. Will hide out of scope comments from this thread.

reuvenpo commented 4 years ago

Thank you for moving my suggestion to the relevant place :) This same technique applies to the wasm-vm interface as well though. the contracts already export an

fn allocate(size: usize) -> *mut c_void;

function. When returning from the read_db function, the VM can call this function and write the result to it. Then you would return this void pointer as the return value of the read_db function (instead of taking the second value parameter, which points to preallocated memory), which the contract can easily reinterpret as a native Vec<u8>

webmaster128 commented 4 years ago

That's an interesting approach worth looking into. I ran into problems issues so far

  1. When read_db returns a pointer, we are loosing the ability to report errors. We could group all errors into the null pointer or find a different way to return the pair (error code, result pointer).
  2. When the import env.read_db in created as part of the Wasmer module instantiation, the cosmwasm_vm::instance::Instance doe not yet exist. This Instance instance is required to call .allocate(). It would be possible to allocate with the wasmer instance directly, but this does not exist as well when the import implementation function is created. Maybe it is possible to call exports using the Wasmer context directly, but even if it works, the code gets ugly.
reuvenpo commented 4 years ago
  1. An error there is irrelevant, since reading the db is an infallible operation :) The only and closest thing to an error condition is a missing key, which can be represented as the null pointer.
  2. interesting... I'll have to look at the code again and give you a good answer.
webmaster128 commented 4 years ago

Since memory allocation gets much cheaper in 0.8 due to https://github.com/CosmWasm/cosmwasm/pull/187 and https://github.com/CosmWasm/go-cosmwasm/pull/60, we can increase MAX_READ at no (!) gas cost.

For this reason I suggest increasing the limit to 16 KiB for the 0.8 release in the strandard library as well as go-cosmwasm. That's an order of magniture that allows you to store RSA keys and similar stuff with some encoding overhead. It allows is to postpone the general solution for a couple of months at least.

webmaster128 commented 4 years ago
  1. The more we work on this, the more error cases we find that were silently ignored or caused panics before. With #216, Storage.read will most likely become Result<Option<Vec<u8>>>. There will be error handling for sure.
  2. Even before we have cosmwasm_vm::instance::Instance, there is a wasmer_instance: wasmer_runtime_core::instance::Instance. Only the later is required to execute functions. We can store a pointer to this instance in the ContextData, allowing us to call back into the contract from the read_db implementation, as @reuvenpo suggested. Since the allocate result is a memory location < 2 GB, it should be possible to report the result location in the positive part of read_db's i32 result.

Given that any kind of retry solution would require getting the same db value from the backend multilple times (which is expensive because of the Merkleized IAVL+ Tree), I'd avoid going the retry route.

webmaster128 commented 4 years ago

Blocked by #234

ethanfrey commented 4 years ago

Even before we have cosmwasm_vm::instance::Instance, there is a wasmer_instance: wasmer_runtime_core::instance::Instance. Only the later is required to execute functions. We can store a pointer to this instance in the ContextData, allowing us to call back into the contract from the read_db implementation, as @reuvenpo suggested. Since the allocate result is a memory location < 2 GB, it should be possible to report the result location in the positive part of read_db's i32 result.

Interesting. So we no longer preallocate buffers on the wasm-side, but rather let the vm allocate proper space for the return and then return the pointer. It feels a bit odd storing a pointer to the instance in the ContextData, but I can't say it is unsafe, not another way to acheive this callback.

One concern is next_db, which writes back to 2 buffers. We cannot return 2 pointers from a wasm function (only one i32). This is the exact case that motivated the changes to Region.

ethanfrey commented 4 years ago

What about removing the callback into instance (which is a major change) and do:

  1. retry with multiple calls to real storage
  2. once that passes, cache "too big" results in ContextData until the next call

I agree that we want to avoid multiple calls into the blockchain storage, and some items like .next() cannot be called multiple times safely (not idempotent).


Update: actually, caching seems to be a bigger hack. Another idea:

This mean, only changes WASM side are not allocating prior to call, but passing an empty Region for *mut c_void. The VM side will need to make changes you as suggest above (calling into wasmer_instance), but we can maintain the same function signatures, and easily handle 2 return Regions (next_db)

webmaster128 commented 4 years ago

Looking at https://github.com/CosmWasm/cosmwasm/pull/239, I don't see it is worth wasting more time with this.

We can easily allocate 128 KiB or even 1 MiB upfront, which costs basically nothing and is very little memory compared to the whole chain, considering that all executions are sequential.

ethanfrey commented 4 years ago

Since we now have cheap allocations, I agree with your suggestion.

Let's just set the preallocate to 128kB.

We can then later set global memory limits per instance (#81) to something like 2MB, and this will work fine.

ethanfrey commented 4 years ago

Panic on reads greater than 128kB are fine, as long as we reject any writes larger than that size as well (on the wasm side), at least via ExternalStorage. Thus, anyone using provided wrappers can never write data they cannot read back. And anyone using the FFI calls directly is responsible for the allocs themselves.