golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.33k stars 17.58k forks source link

cmd/compile: wasm code causes out of memory error on Chrome and Firefox for Android #27462

Closed termonio closed 5 years ago

termonio commented 6 years ago

I am very excited that Go ships now with Webassembly support. I ran wasm code generated by Go 1.11 on Chrome and Firefox on desktops (MacOS and Linux) and on Chrome, Firefox and Safari on iOS devices. Running Go generated wasm code on Android devices failed though.

Minimal example

package main

import (
    "fmt"
)

func main() {
    fmt.Println("hello")
}
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width,initial-scale=1.0">
    </head>
    <body>
        <script src="wasm_exec.js"></script>
        <script>
            (async function() {
                const wasmFile = "test.wasm"
                let run
                const go = new Go()
                try {
                    const { instance } = await WebAssembly.instantiateStreaming(fetch(wasmFile), go.importObject)
                    document.querySelector('#info').innerHTML = "ready"
                    run = go.run(instance)
                } catch (err) {
                    document.querySelector('#info').innerHTML = err
                    console.log(err)
                }
            })()
        </script>
        <div id="info"></div>
    </body>
</html>

Expected behavior

Actual behavior

agnivade commented 6 years ago

Since the code is same for all desktop, iOS and Android devices, I doubt there is much we can do here.

@neelance has some optimizations in mind for 1.12. But unless you have some specific suggestions, this just falls in the category of general optimizations which will happen anyways.

termonio commented 6 years ago

Given the market share of Android devices, this would be a major drawback ...

thesyncim commented 6 years ago

can you try this?

wasm-opt test.wasm -O -o testO.wasm

agnivade commented 6 years ago

btw @termonio - You should also build with -ldflags='-s -w' for slightly smaller binaries. It most probably will not help with overall memory instantiation, but can help with payload size.

termonio commented 6 years ago

@thesyncim: wasm-opt test.wasm -O -o testO.wasm reduces the file size from 2.4MB to 2.3MB. The optimized file works on Desktops and iOS but still not on Android.

@agnivade: Your hint has a similar effect. The payload is reduced to 2.3MB, but can't be run on Android.

For someone who wants to reproduce this issue: iOS devices need a polyfill (omitted in my minimal code example above).

if (!WebAssembly.instantiateStreaming) {
    WebAssembly.instantiateStreaming = async (resp, importObject) => {
        const source = await (await resp).arrayBuffer()
        return await WebAssembly.instantiate(source, importObject)
    }
}
thesyncim commented 6 years ago

you can try other optimization levels like -O2 or -O4 for binary size consider using -Os or -Oz

termonio commented 6 years ago

I did try other optimization levels but got core dumps (not further investigated yet).

thesyncim commented 6 years ago

@termonio try to run with -d (debug option) (in my case running with -d avoid core dumps, also -O4 requires a lot of memory)

termonio commented 6 years ago

@thesyncim: running with -d yields an output and further optimization makes the file sizes shrink down to 1.9MB. Won't run on Android though (tried -O2, -O4, -Os, and -Oz).

thesyncim commented 6 years ago

@termonio sorry to hear that, just one last thought, did you try to run the optimization on top of @agnivade sugestion -ldflags='-s -w'?

termonio commented 6 years ago

@thesyncim : Yes indeed. I tried all 12 combinations (with and without -ldflags='-s -w' and for each in addition wasm-opt for optimization levels -O, -O2, -O4, -Os, and -Oz). I don't think this issue can be resolved with general optimization. (I guess it is how much memory can be allocated on an Android device for Webassembly. This seems to be smaller than on other platforms. new WebAssembly.Memory({initial: x}) fails on my Android device when x is around 9000 64kB pages.)

termonio commented 6 years ago

I looked into the instance that is returned from WebAssembly.instantiateStreaming(fetch(wasmFile), go.importObject) when running on a desktop browser. instance.exports.mem hold a WebAssembly.Memory object. The allocated ArrayBuffer holds 1073741824 Bytes = 1GB after instantiation! I am wondering whether that much memory is really needed.

neelance commented 6 years ago

The solution most likely depends on https://github.com/WebAssembly/design/blob/master/FutureFeatures.md#finer-grained-control-over-memory. However, even currently it is not necessary for the WebAssembly host and operating system to physically allocate the full amount of memory, since most of it is not used. For example on Chrome on OS X, the operating system reports a much lower memory usage than the 1GB that WebAssembly requests.

termonio commented 6 years ago

Why does WebAssembly allocate this 1GB of (mostly unused) memory in the first place? Is there a way to limit the amount of memory it can request? (WebAssembly.Memory({initial: x, maximum: y}) comes to mind but my attempts to populate the importObject with preallocated memory did not succeed.) Edit: Dumping a wasm file with wasm-dump shows that memory is indeed set to 16384 64kB pages (=1GB): memory[0] pages: initial=16384. I am wondering whether this can be changed to a more reasonable size.

cherrymui commented 6 years ago

The initial 1 GB memory is control by this line:

https://go.googlesource.com/go/+/go1.11/src/cmd/link/internal/wasm/asm.go#310

I don't think we will change this setting in Go 1.11. But you can modify the source and rebuild the toolchain.

agnivade commented 6 years ago

@neelance - I see a TODO there to use lower initial memory size. I believe the challenge to set the correct initial memory size is to somehow analyze the code being compiled and come up with the base minimum memory the code would need ?

Is it possible to hoist this from being hardcoded in the binary to being set from the importObject ? So that the user has control over the value. Or do we not want to expose more knobs ?

termonio commented 6 years ago

I wrote a small tool that can patch the memory section of a .wasm binary. This allows for easy experimenting with smaller initial page sizes without building a modified tool chain. It seems as if during instantiation quite a bit of memory is allocated by the WebAssembly runtime. When starting with 4096 pages (256MB) the runtime grows the memory on my desktop machine to 745865216 bytes (about 710MB, more than 11000 pages). As I have trouble allocating more than 7500 pages on my Android devices, this approach alone won't help to make Go generated .wasm binaries run on Android. I am surprised that the instantiation is that expensive but I can understand now why the initial memory was set to 1GB ...

Yaoir commented 5 years ago

I also ran into this problem. My Go/WebAssembly app may be helpful for analysis or testing bug fixes in the wasm compiler: https://github.com/Yaoir/VideoPoker-Go-WebAssembly

I tried the wams tool, and found it seemed to alter reliability, but if reliability increased in one browser, it decreased or entirely broke the app in another browser or on another device. Overall, nothing was solved.

I worked with the JavaScript glue code that's in the HTML file to load sequentially rather than using the streaming JavaScript calls. I still got "out of memory" errors. I also put console.log() statements between calls to fetch, compile, and load to see where things went wrong. I did not find any solutions, but learned that the "out of memory" error occurs during the linking phase of the fetch/compile/load sequence.

I noticed that if the app isn't entirely broken on a combination of browser and device, it sometimes it can be made to work by clearing the browser cache, restarting the browser app, and loading the WebAssembly app fresh, into a browser that has not loaded any other pages already. But reloading the page one or more times may bring up the error, and reloading after getting an error may actually work! For a while, I had Firefox for Android running the app successfully every other time the page was reloaded.

twifkak commented 5 years ago

I hacked on the runtime library to reduce the initial allocation to ~80MB: https://github.com/golang/go/compare/master...twifkak:small

A couple of notes:

Update: It seems this is bad for processes that create a lot of flyweight objects. runtime/mem_js.go needs a free list or some such. Update 2: I wrote a free list. It requires GODEBUG=gcstoptheworld=1.

free1139 commented 5 years ago

It works. Thank you! @twifkak

twifkak commented 5 years ago

I updated my fork to implement a free list, so that Go can reclaim freed memory in wasm. Using this, combined with GODEBUG=gcstoptheworld=1, I was able to run a pretty allocation-intensive workload (parsing a bunch of HTML files) with <124MB. (Using the concurrent GC, it crashes.)

Update: With GOGC=20 (arbitrary first guess), memory is <80MB.

olso commented 5 years ago

It works! Thank you so much @twifkak.

I'm writing article/opinion/tutorial where my goal is to create a simple game that can be run on mobile because I need touch interaction.

You saved me so much time! ❤️

I'll share the publication once its done!

twifkak commented 5 years ago

Hi @olso! IME that error happens when you use the wasm_exec.js from Go 1.11. Grab a copy from the Go 1.12 release. (Tip will probably work too.) Thanks for sharing! I look forward to the article.

twifkak commented 5 years ago

Hi all, I would love to clean up my fork so it could be merged into golang proper and this bug fixed. The one thing preventing me from doing so is that it depends on gcstoptheworld=1. There appears to be a race condition and/or missing write barrier, but I have no idea where.

Can anybody help?

@neelance Any ideas what I'm doing wrong? Or who to contact for help?

cherrymui commented 5 years ago

I think this is no "actual" race in the current Wasm implementation -- it is single threaded, no preemption, and atomic operations are just plain load/stores. I don't think the race detector could help anything.

gccheckpoint

I guess you mean gccheckmark? If not already, try gccheckmark.

twifkak commented 5 years ago

@cherrymui Thanks for the response! Yes, I meant to say gccheckmark. I added that and it didn't change the stdout/stderr at all; what should I expect to see?

Please take a look at the fork for races. It definitely has non-atomic behavior. I was assuming that there will never be two instances of sysFree or sysReserve running at the same time. Is that an unsafe assumption?

twifkak commented 5 years ago

Oops, I realize I didn't post the errors I'm seeing. First:

runtime: nelems=1024 nalloc=174 previous allocCount=12 nfreed=65374
fatal error: sweep increased allocation count

runtime stack:
runtime.throw(0x89a59, 0x20)
    /home/twifkak/devel/go/src/runtime/panic.go:617 +0x6
runtime.(*mspan).sweep(0x3c1470, 0x3c1400, 0x15866)
    /home/twifkak/devel/go/src/runtime/mgcsweep.go:326 +0x98
runtime.(*mcentral).uncacheSpan(0x39b500, 0x3c1470)
    /home/twifkak/devel/go/src/runtime/mcentral.go:197 +0xc
runtime.(*mcache).releaseAll(0x3b72c0)
    /home/twifkak/devel/go/src/runtime/mcache.go:155 +0x7
runtime.(*mcache).prepareForSweep(0x3b72c0)
    /home/twifkak/devel/go/src/runtime/mcache.go:182 +0x5
runtime.procresize(0x1, 0xcf00000000)
    /home/twifkak/devel/go/src/runtime/proc.go:4039 +0xae
runtime.startTheWorldWithSema(0x1, 0x427270)
    /home/twifkak/devel/go/src/runtime/proc.go:1097 +0xa
runtime.gcMarkTermination.func3()
    /home/twifkak/devel/go/src/runtime/mgc.go:1668 +0x2
runtime.systemstack(0x3b72d0)
    /home/twifkak/devel/go/src/runtime/asm_wasm.s:171 +0x2
runtime.mstart()
    /home/twifkak/devel/go/src/runtime/proc.go:1153

goroutine 6 [running]:
runtime.systemstack_switch()
    /home/twifkak/devel/go/src/runtime/asm_wasm.s:182 fp=0x438540 sp=0x438538 pc=0x13490000
runtime.gcMarkTermination(0x3fce33e3fb1df968)
    /home/twifkak/devel/go/src/runtime/mgc.go:1668 +0x2e fp=0x438710 sp=0x438540 pc=0x1129002e
runtime.gcMarkDone()
    /home/twifkak/devel/go/src/runtime/mgc.go:1550 +0x29 fp=0x438760 sp=0x438710 pc=0x11280029
runtime.gcBgMarkWorker(0x426000)
    /home/twifkak/devel/go/src/runtime/mgc.go:1933 +0x31 fp=0x4387d8 sp=0x438760 pc=0x112b0031
runtime.goexit()
    /home/twifkak/devel/go/src/runtime/asm_wasm.s:422 +0x1 fp=0x4387e0 sp=0x4387d8 pc=0x136e0001
created by runtime.gcBgMarkStartWorkers
    /home/twifkak/devel/go/src/runtime/mgc.go:1754 +0xc

Second:

runtime: s.allocCount= 7 s.nelems= 16
fatal error: s.allocCount != s.nelems && freeIndex == s.nelems

goroutine 2 [running]:
runtime.throw(0x8d44f, 0x31)
    /home/twifkak/devel/go/src/runtime/panic.go:617 +0x6 fp=0x1ef5b7e8 sp=0x1ef5b7c0 pc=0x11d50006
runtime.(*mcache).nextFree(0x3b72d0, 0x33, 0x1c7fe000, 0x1c7fe000, 0x1d570086)
    /home/twifkak/devel/go/src/runtime/malloc.go:789 +0x24 fp=0x1ef5b828 sp=0x1ef5b7e8 pc=0x10ae0024
runtime.mallocgc(0x200, 0x0, 0x5c0e200, 0x1eca0d00)
    /home/twifkak/devel/go/src/runtime/malloc.go:944 +0x8f fp=0x1ef5b8d0 sp=0x1ef5b828 pc=0x10af008f
runtime.growslice(0x32700, 0x1eca0d00, 0xfa, 0x100, 0x172, 0x0, 0x0, 0x100)
    /home/twifkak/devel/go/src/runtime/slice.go:175 +0x19 fp=0x1ef5b928 sp=0x1ef5b8d0 pc=0x12810019
strings.(*Builder).WriteString(...)
    /home/twifkak/devel/go/src/strings/builder.go:122
github.com/ampproject/amppackager/transformer/transformers.replaceURLs(0x1eca0a00, 0xb9, 0x452900, 0x3, 0x4, 0x5c0d200, 0x19f4b80, 0xd, 0x43cc90, 0x453b00, ...)
    /home/twifkak/.go/src/github.com/ampproject/amppackager/transformer/transformers/urlrewrite.go:202 +0x4c fp=0x1ef5ba58 sp=0x1ef5b928 pc=0x2211004c
github.com/ampproject/amppackager/transformer/transformers.(*elementNodeContext).rewrite(0x43e4c0, 0x5c0d200, 0x19f4b80, 0xd, 0x43cc90)
    /home/twifkak/.go/src/github.com/ampproject/amppackager/transformer/transformers/urlrewrite.go:174 +0xd fp=0x1ef5bab8 sp=0x1ef5ba58 pc=0x220f000d
github.com/ampproject/amppackager/transformer/transformers.convertToAMPCacheURLs(0x5c0c000, 0x7, 0x8, 0x5c0d200, 0x817b2, 0xa, 0x0)
    /home/twifkak/.go/src/github.com/ampproject/amppackager/transformer/transformers/urlrewrite.go:155 +0xf fp=0x1ef5bbd0 sp=0x1ef5bab8 pc=0x220e000f
github.com/ampproject/amppackager/transformer/transformers.URLRewrite(0x7ef320, 0x0, 0x0)
    /home/twifkak/.go/src/github.com/ampproject/amppackager/transformer/transformers/urlrewrite.go:140 +0x8a fp=0x1ef5bca8 sp=0x1ef5bbd0 pc=0x220d008a
github.com/ampproject/amppackager/transformer.glob..func1(0x7ef320, 0x43e440, 0x8, 0x8, 0x1, 0x1)
    /home/twifkak/.go/src/github.com/ampproject/amppackager/transformer/transformer.go:102 +0x7 fp=0x1ef5bcf0 sp=0x1ef5bca8 pc=0x222c0007
github.com/ampproject/amppackager/transformer.Process(0x1ef5be68, 0x400004, 0x400004, 0x1ec56000, 0xdc77, 0x0)
    /home/twifkak/.go/src/github.com/ampproject/amppackager/transformer/transformer.go:273 +0x1d fp=0x1ef5bdc0 sp=0x1ef5bcf0 pc=0x2229001d
main.transform(0x0, 0x3b1918, 0x0, 0x0, 0x19f4401, 0x1578ef13769d3d00)
    /home/twifkak/.go/src/github.com/ampproject/amppackager/cmd/transform_wasm/main_go1.12.go:109 +0x1f fp=0x1ef5bf10 sp=0x1ef5bdc0 pc=0x2234001f
syscall/js.handleEvent()
    /home/twifkak/devel/go/src/syscall/js/func.go:90 +0x28 fp=0x1ef5bfa8 sp=0x1ef5bf10 pc=0x171b0028
runtime.handleEvent()
    /home/twifkak/devel/go/src/runtime/lock_js.go:179 +0x6 fp=0x1ef5bfd8 sp=0x1ef5bfa8 pc=0x10a80006
runtime.goexit()
    /home/twifkak/devel/go/src/runtime/asm_wasm.s:422 +0x1 fp=0x1ef5bfe0 sp=0x1ef5bfd8 pc=0x136e0001
created by runtime.init.0
    /home/twifkak/devel/go/src/runtime/lock_js.go:142 +0x2
cherrymui commented 5 years ago

gccheckmark doesn't normally print anything. But if a program fails with "sweep increased allocation count", with gccheckmark it will likely to fail with something like "checkmark found unmarked object" with more information about where the object is. And the failure rate may be higher (easier to reproduce).

two instances of sysFree or sysReserve running at the same time.

As the program is single threaded, there wouldn't be two things running at same time. Or you mean sysFree or sysReserve somehow (indirectly) recurse into itself? As long as they only call nosplit functions and don't allocate, I don't think they will. Anyway, you can just set a global variable, like inSysFree, upon entering those functions, so you know if it reenters the function with the variable already set.

olso commented 5 years ago

@twifkak

Here is the article - https://medium.com/@martinolsansky/webassembly-with-golang-is-fun-b243c0e34f02

And my /src https://github.com/olso/go-wasm-cat-game-on-canvas-with-docker

Thank you again! 😻

gabbifish commented 5 years ago

Hi all! I looked into the bug @twifkak had encountered, and it looked like a race condition between calls to sysReserve and sysFree (this occurs when gc and malloc race, I believe). I implemented mutexes (similar to mem_plan9.go) to prevent this race condition, and it seems to work correctly. I'd love for other folks to give this branch a try, though, and let @twifkak or me know if they've run into problems: https://github.com/golang/go/compare/master...gabbifish:gabbi-small?expand=1#diff-7e39049a2de75c1412aac58accc6a92f

free1139 commented 5 years ago

The branch code of @twifkak works well. It's just that the value of "writeUleb128'(ctxt. Out, 1024*1)" to initialization memory is still higher on my phone, which leads to slow loading. But this Page setting is too low and can easily overflow when running on my browser (Firefox on HTC 10, Safari on iPhone 6 plus). This needs to be set according to the actual needs of the project memory to achieve better results.

I actually used @termonio's wams tool to set up wasm's runtime memory. The combination of the two works very well. Here's the shell of my building project.

#!/bin/sh

GOROOT=/usr/local/go-wasm # modify on go1.12rc1

PATH=$GOROOT/bin:/usr/local/bin:/sbin:/bin

GO111MODULE=on GODEBUG=gcstoptheworld=1 GOARCH=wasm GOOS=js go build -ldflags "-s -w" -o ../../../disc/yujian/main.wasm || exit 0
wams -pages 64 -write ../../../disc/yujian/main.wasm || exit 0

And with @twifkak modifications, go-wasm really works well on my Andoird and iPhone. Now I have other problem is whether the binary compiled by go-wasm is somewhat large. I can only use syscall/js to call the api of js, instead of using the standard library of go to keep the binary size of 4M. When I imported go's standard libraries (strings, json, base64, http), the binary package was almost 7~8M. This size reloaded twice on the web on my iPhone 6 plus it reported wasm compiler error of memory overflow. Maybe the wasm implemented by mobile browser is too weak.

ghost commented 5 years ago

I'm seeing OOM on win7 latest firefox, and also android7 latest firefox, so actually it only works on my macos 10.12.4 firefox63 currently :(
Has anyone tried the binaryen wasm-opt tool? I also saw mention of a snip tool used for shrinking Rust generated wasm. I cannot build either here, but I am interested if anyone has had success?

ghost commented 5 years ago

Using some of the optimization flags mentioned here, I got my .wasm file down to ~2.6MB but still seeing OOM errors on everything but my mac :( I haven't tried the fork mentioned above (I'm very new to Go) but I did try using 'bytecoder' with some Java for comparison...

That generates a 15KB file (also interacting with Javascript / canvas 2D) and runs great on Android g-tab2, and even an old Nexus 5 Android 6 device! and obviously the load times are vastly improved. Although still no success on my iOS devices, but that might be because I keep them on older iOS versions for development testing.

I know that wasm support is still experimental, but could this issue possibly be upgraded from 'Performance' ? Since currently this makes Go WASM unusable in many situations, and this probably needs fixing before WASI hits mainstream. Thanks.

gopherbot commented 5 years ago

Change https://golang.org/cl/170950 mentions this issue: runtime, cmd/link: optimize memory allocation on wasm

maxence-charriere commented 5 years ago

Seems like the situation is better at first loading but out of memory still occurs when reloading a page that uses go wasm on mobile.

Using safari on iPhone.

Edit: I tried on a recent android phone, it still does not get through the first load.

free1139 commented 5 years ago

Seems like the situation is better at first loading but out of memory still occurs when reloading a page that uses go wasm on mobile.

Using safari on iPhone.

Yes, it hangs up with a few more refreshes on the safari of the iphone, and you need to reopen the page at this time. I debugged on PC, and other browsers did not release memory in time after refreshing, so did WASM compiled by rust and C. I think this has nothing to do with the compiler language, but with the WASM performance of the browser implementation.