Supporting smaller memory pages

axic commented 5 years ago

This came up as a discussion at Devcon 4. I think the problem was mentioned by @poemm.

Wasm has a fixed memory page size of 64kb of which a single page is always required due to the nature how the EEI works. This means every single contract on every invocation will use at least 64kb of memory. Most of the time contracts don't actually need that much memory.

It would be a bad idea to fork the wasm spec (resulting in an incompatible ewasm) or silently change VMs used in Ethereum clients to consider the memory page of smaller size (resulting in an incompatible VM).

I propose a way around this: introduce a custom section, which declares the page size. This has one restriction: the declared page size cannot exceed 64kb.

With this restriction in mind the wasm bytecode is fully compatible with every VM, because in "worst" (default) case, it will execute with 64kb page size.

When a VM understands this custom section, it can reduce the page size. If this proposal is accepted, we'd define a fixed page size in ewasm spec (probably something like 4kb or 8kb).

Analysis tools could use this custom section to validate that contracts do not overread memory. Though in the case they do, they would just stop (with all gas consumed) due to bounds read - which they can do today anyhow.

jakelang commented 5 years ago

Considering logistics, what changes to existing VM implementations would this require?

On Tue, Nov 20, 2018, 11:57 Alex Beregszaszi <notifications@github.com wrote:

This came up as a discussion at Devcon 4. I think the problem was mentioned by @poemm https://github.com/poemm.

Wasm has a fixed memory page size of 64kb of which a single page is always required due to the nature how the EEI works. This means every single contract on every invocation will use at least 64kb of memory. Most of the time contracts don't actually need that much memory.

It would be a bad idea to fork the wasm spec (resulting in an incompatible ewasm) or silently change VMs used in Ethereum clients to consider the memory page of smaller size (resulting in an incompatible VM).

I propose a way around this: introduce a custom section, which declares the page size. This has one restriction: the declared page size cannot exceed 64kb.

With this restriction in mind the wasm bytecode is fully compatible with every VM, because in "worst" (default) case, it will execute with 64kb page size.

When a VM understands this custom section, it can reduce the page size. If this proposal is accepted, we'd define a fixed page size in ewasm spec (probably something like 4kb or 8kb).

Analysis tools could use this custom section to validate that contracts do not overread memory. Though in the case they do, they would just stop (with all gas consumed) due to bounds read - which they can do today anyhow.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ewasm/design/issues/161, or mute the thread https://github.com/notifications/unsubscribe-auth/AHEQh6LQrUdt1O-6W59U3w-wqhLgwUa7ks5uxDSRgaJpZM4Yrh-m .

axic commented 5 years ago

Considering logistics, what changes to existing VM implementations would this require?

You wouldn't change a general VM to do this. A specialised VM in a consensus client would have extra piece of code to check the custom section and consider pages accordingly. I think in a well design VM changing the page size shouldn't be more than a few lines.

pepyakin commented 5 years ago

What would happen with memory.grow and memory.size? Will they still operate (return / receive) on 64k pages?

I think in a well design VM changing the page size shouldn't be more than a few lines.

But this still will entail forking a VM, right?

axic commented 5 years ago

What would happen with memory.grow and memory.size? Will they still operate (return / receive) on 64k pages?

~~Yes,~~ No, all pages would be the specified size.

But this still will entail forking a VM, right?

Correct, although I envision that all consensus clients would have specialised VMs and not generic VMs designed for the web.

gballet commented 5 years ago

A summary of what I understand so far: a 64 kb area is used once and for all, and every time a contract ends the amount of data that it declared in its custom section is being zeroed out. If you don't re-use the same area, then you need an allocation and zeroing it out would cost less ( on top of that, most of that page would definitely be in the cache anyway given the short execution time of contracts).

If my assumption is correct then:

it is easy to check if the contract called memory.grow by reading the final size of the memory and charging all gas if that happens.
it is always possible to write beyond the promised boundary. To check that without forking the interpreter(s), we need to scan the entire 64k anyway, thus incurring a cost similar to zeroing it out (remember, CPU caching is as play).

So I'm just wondering if we should bother doing this at all ?

gballet commented 5 years ago

@axic in answer to yesterday's question, it would be fairly easy to change wagon's default size to something lower and not much harder to make it depend on a custom section.

pepyakin commented 5 years ago

IMO, I don't think that it is worth a hastle.

First, I do think that this is effectively forking of the specification. The specification states that memory instance size is always multiple of 64kb. The immediate effect that memory.size, memory.grow operate on multiples of that size. For example, the platform-dependent code, allocators in particular, already depend on this kind of invariant. So such kind of change would introduce fragmentation to the space, would force the developer audit all dependencies on the fact of compatibility with ewasm and so on.

On the other hand, I don't really think that using 64kb memories is much of a problem. This is a negligable small amount for the current hardware, processing of contracts is not massively parallel (so you probably won't have much more than a single memory instance at a time).

poemm commented 5 years ago

WebAssembly was designed for web apps where run-time dominates set-up time (including zeroing memory). Unfortunately, benchmarks show that many ewasm contracts are dominated by set-up costs.

Since this discussion is about memory, a quick laptop benchmark shows that it takes 10ms to zero 50 64kB pages. So an upper bound on a 200ms block is ~500~ 1000 pages, not counting contract runtime. This hinders scaling goals.

64kB pages are big and awkward. Most OSs use 4kB pages. 1kB would be reasonable for smart contracts. @ehildenb also mentioned curiosity about smaller page sizes, like 1kB or 512 B.

A quick look shows that many implementations set page size as a variable. Of course, an audit is needed before changing this variable. Firefox: https://hg.mozilla.org/mozilla-central/file/tip/js/src/wasm/WasmTypes.h#l2260 Chromium: https://github.com/v8/v8/blob/fd334b3216488011b368ec4652819e08c38d0d36/src/wasm/wasm-constants.h#L77

I agree that deviating from the Wasm spec presents new problems. But ignoring benchmarks and major bottlenecks presents problems too. So I consider this a major design decision.

pepyakin commented 5 years ago

Note, that I don't argue that changing the page size constant will be hard (indeed, I would expect that this would be just a constant in most implementation), just that it would introduce some fragmentation.

Since this discussion is about memory, a quick laptop benchmark shows that it takes 10ms to zero 50 64kB pages. So an upper bound on a 200ms block is 500 pages, not counting contract runtime. This hinders scaling goals.

This one implies clearing pages upfront and at once. I was under impression that this could be easily solved by some techniques. The simplest one would be doing it lazily, on the first access to the physical (well, physical as in virtual memory : ) ) page. Do you see any problems with doing that?

poemm commented 5 years ago

This one implies clearing pages upfront and at once. I was under impression that this could be easily solved by some techniques. The simplest one would be doing it lazily, on the first access to the physical (well, physical as in virtual memory : ) ) page. Do you see any problems with doing that?

Not sure how to answer your question without telling the rest of the story.

This problem is not just about page size. For ewasm contracts, set-up time (instantiation including parsing, memory allocation, importing, etc) often dominates run-time. A possible solution: maintain instantiated contracts which are ready to call, and, at each call, zero the memory and globals.

Some possibilities follow to handle memory between calls.

1) Deviate from the spec and free memory after each call and allocate new pages at each call. 2) Deviate from the spec and shrink the memory back to the original size after each call. 3) Disallow memory.grow and allow each precompile to maintain some bounded amount of memory across calls.

In any case, smaller page size is better. Smaller pages also improve the zero-ing bottleneck.

Please correct me if I am missing something.

gballet commented 5 years ago

The simplest one would be doing it lazily, on the first access to the physical (well, physical as in virtual memory : ) ) page. Do you see any problems with doing that?

Having access to the page table. Not hard to do with a dedicated driver; I think it would be overkill to save 100ms every 50 contract executions.

gballet commented 5 years ago

Some possibilities follow to handle memory between calls.

1. Deviate from the spec and free memory after each call and allocate new pages at each call.

2. Deviate from the spec and shrink the memory back to the original size after each call.

3. Disallow memory.grow and allow each precompile to maintain some bounded amount of memory across calls.

Like I said above, you can also not change the spec and punish any call to memory.grow with claiming all of the account's fund. Then zero-out a 64k page and reset the stack.

eira-fransham commented 5 years ago

The simplest one would be doing it lazily, on the first access to the physical (well, physical as in virtual memory : ) ) page. Do you see any problems with doing that?

Having access to the page table. Not hard to do with a dedicated driver; I think it would be overkill to save 100ms every 50 contract executions.

You don't need that, you can just use calloc which internally tells the OS to lazily zero the pages. Rust does this already with vec![0; n].

gballet commented 5 years ago

How is the calloc method going to detect the first access to a page to zero it out?

eira-fransham commented 5 years ago

Because it's implemented with magic in the OS.

gballet commented 5 years ago

ah, magic ok ;)

eira-fransham commented 5 years ago

https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc

jakelang commented 5 years ago

@Vurich I am pretty sure calloc initializes memory up front. It would be lazily called by the VM upon memory access.

gballet commented 5 years ago

I depends on the OS and the libc, But even if most Unices tend to allocate the VM with invalid references and wait for a page fault to fill it with a zeroed-out page, it means that you have underlying allocations which is the opposite of what is being sought.

https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc

Accordingly, the calloc is faster in that contrived example, because the issue creator doesn't write to memory. I would expect results to be much less impressive if they had written something.

pepyakin commented 5 years ago

A possible solution: maintain instantiated contracts which are ready to call, and, at each call, zero the memory and globals.

I'm not sure that I follow you here. Do you mean pre-instantiate wasm modules and then keep them in memory in hope that they will be reused at some time?

Smaller pages also improve the zero-ing bottleneck.

But this also implies zeroing wasm pages at once upfront, whether I'm saying that it may be possible to zero each page on the first access.

Having access to the page table. Not hard to do with a dedicated driver;

Well, you don't need a direct access to the page table. As was mentioned before, calloc should do, although I was thinking about catching page faults in a signal handler (since most wasm engines would contain such a mechanism anyway implementing bound checking).

Accordingly, the calloc is faster in that contrived example, because the issue creator doesn't write to memory. I would expect results to be much less impressive if they had written something.

Yeah, right. But we are not talking about avoiding this cost altogether (since zeroing should happen anyway), but doing this lazily thus distributing cold start cost by running time. Changing the page size to something lower will not avoid this cost altogether as well.

edit: also rust doesn't use anything other than jemalloc unless you explicitly use another allocator.

@jakelang Nit: that's not true in latest versions of rustc : p

gballet commented 5 years ago

I was thinking about catching page faults in a signal handler (since most wasm engines would contain such a mechanism anyway implementing bound checking).

That would make sense yet none of the engines I have looked at do. Though it should be easy to implement.

Changing the page size to something lower will not avoid this cost altogether as well.

I agree. In fact, depending on the allocator, it would tend to increase the amount of page faults thereby slowing the whole thing down.

lrettig commented 5 years ago

It would be a bad idea to fork the wasm spec (resulting in an incompatible ewasm)

I'd like to revisit this assumption. A lot of the ideas being discussed in this thread involve deviating from the Wasm spec in one way or another, e.g.:

Deviate from the spec and free memory after each call and allocate new pages at each call.

Deviate from the spec and shrink the memory back to the original size after each call.

Disallow memory.grow and allow each precompile to maintain some bounded amount of memory across calls.

Why is forking the spec necessarily a bad thing? Especially if there are inherent incompatibilities between it and our use case, e.g.:

64kB pages are big and awkward. Most OSs use 4kB pages. 1kB would be reasonable for smart contracts. @ehildenb also mentioned curiosity about smaller page sizes, like 1kB or 512 B.

and

For ewasm contracts, set-up time (instantiation including parsing, memory allocation, importing, etc) often dominates run-time

Wasm obviously wasn't designed for resource-constrained, deterministic, on-chain computation. Maybe we shouldn't be afraid to deviate from the reference spec, although of course we would want to consider and discuss the implications.

pepyakin commented 5 years ago

TBH, I don't fully understand the idea.

This problem is not just about page size. For ewasm contracts, set-up time (instantiation including parsing, memory allocation, importing, etc) often dominates run-time. A possible solution: maintain instantiated contracts which are ready to call, and, at each call, zero the memory and globals. Some possibilities follow to handle memory between calls.

I assume that ethereum call is meant here. I also assume that this talk is mostly about precompiles. Then holding these wasm instances in memory makes more sense, although IMO this won't work for smart-contracts.

Deviate from the spec and free memory after each call and allocate new pages at each call.

Deviate from the spec and shrink the memory back to the original size after each call.

Assuming I get the idea right, then I'm not sure how is this deviating from the spec. These cases can be implemented as instantiating a module once and then using it as a prototype for either efficient cloning or CoW (again, vmem tricks, map memory to the initial instance as long as it's not changed. Copy or zero memory depending on the dirtiness of the page in the prototype instance on the write attempt).

From the spec perspective this can be seen just as copying store contents.

Disallow memory.grow and allow each precompile to maintain some bounded amount of memory across calls.

BTW, this doesn't require disallowing memory.grow explicitly. The same can be achieved via putting limitations on memory imported by the precompile.

ewasm / design

Supporting smaller memory pages #161