Closed finnbear closed 6 months ago
/cc @neelance @cherrymui @hajimehoshi
@cherrymui Would this be a good case for sync.Pool
? Would it be okay to restrict the maximum number of arguments so we can have a pool of slices with a fixed size?
I've formulated 3 fully complete solutions to the problem:
1) Using sync/pool
as suggested by @neelance: https://github.com/golang/go/compare/master...finnbear:makeargs-syncpool
makeArgs
sync/pool
is standard, safe, and future-proof (multithreaded WASM)makeArgs
makeArgs
My suggestion is for you to:
1) Investigate and implement solution 3
2) Investigate and implement solution 2
3) Try to get //go:noescape
to work (I could not)
4) Investigate and implement solution 1
5) Do nothing
For what it's worth, I have validated all of the above solutions in my WebAssembly game to get an idea of their performance, and observed no crashing/panicking/issues.
@neelance @cherrymui @hajimehoshi
I've been looking for a solution to this ever since I filed the issue, and ended up with a patch that solves the problem (both the slice problem and the interface escaping to heap problem) for me. Specifically, heap allocations per frame went from ~1500-40,000 (almost all related to syscall/js
) to ~60-300 (which are unrelated to syscall/js
). Note that this makes ValueOf
unsafe to call except during Call
, Invoke
, and New
. If this was being integrated into the standard library, one solution would be to leave ValueOf
as-is and make a second, unexported version with the noescape
change.
patch -u $GOROOT/src/syscall/js/js.go -i see_below.patch
--- js.go.dist 2021-01-23 15:50:00.931644132 -0800
+++ js.go 2021-01-23 17:31:54.938784167 -0800
@@ -145,6 +145,11 @@
return valueGlobal
}
+func noescape(foo interface{}) interface{} {
+ bar := uintptr(unsafe.Pointer(&foo))
+ return *((*interface{})(unsafe.Pointer(bar ^ 0)))
+}
+
// ValueOf returns x as a JavaScript value:
//
// | Go | JavaScript |
@@ -159,7 +164,8 @@
// | map[string]interface{} | new object |
//
// Panics if x is not one of the expected types.
-func ValueOf(x interface{}) Value {
+func ValueOf(a interface{}) Value {
+ x := noescape(a)
switch x := x.(type) {
case Value: // should precede Wrapper to avoid a loop
return x
@@ -215,11 +221,12 @@
o.Set(k, v)
}
return o
- default:
- panic("ValueOf: invalid value")
}
+ runtime.KeepAlive(a)
+ panic("ValueOf: invalid value")
}
+//go:noescape
func stringVal(x string) ref
// Type represents the JavaScript type of a Value.
@@ -303,6 +310,7 @@
return r
}
+//go:noescape
func valueGet(v ref, p string) ref
// Set sets the JavaScript property p of value v to ValueOf(x).
@@ -317,6 +325,7 @@
runtime.KeepAlive(xv)
}
+//go:noescape
func valueSet(v ref, p string, x ref)
// Delete deletes the JavaScript property p of value v.
@@ -329,6 +338,7 @@
runtime.KeepAlive(v)
}
+//go:noescape
func valueDelete(v ref, p string)
// Index returns JavaScript index i of value v.
@@ -342,6 +352,7 @@
return r
}
+//go:noescape
func valueIndex(v ref, i int) ref
// SetIndex sets the JavaScript index i of value v to ValueOf(x).
@@ -356,11 +367,24 @@
runtime.KeepAlive(xv)
}
+//go:noescape
func valueSetIndex(v ref, i int, x ref)
-func makeArgs(args []interface{}) ([]Value, []ref) {
- argVals := make([]Value, len(args))
- argRefs := make([]ref, len(args))
+var (
+ argValsSlice []Value
+ argRefsSlice []ref
+)
+
+func makeArgs(args []interface{}) (argVals []Value, argRefs []ref) {
+ for i, _ := range argValsSlice {
+ argValsSlice[i] = Value{}
+ }
+ if len(args) > cap(argValsSlice) {
+ argValsSlice = make([]Value, 0, len(args))
+ argRefsSlice = make([]ref, 0, len(args))
+ }
+ argVals = argValsSlice[:len(args)]
+ argRefs = argRefsSlice[:len(args)]
for i, arg := range args {
v := ValueOf(arg)
argVals[i] = v
@@ -380,6 +404,7 @@
return r
}
+//go:noescape
func valueLength(v ref) int
// Call does a JavaScript call to the method m of value v with the given arguments.
@@ -402,6 +427,7 @@
return makeValue(res)
}
+//go:noescape
func valueCall(v ref, m string, args []ref) (ref, bool)
// Invoke does a JavaScript call of the value v with the given arguments.
@@ -421,6 +447,7 @@
return makeValue(res)
}
+//go:noescape
func valueInvoke(v ref, args []ref) (ref, bool)
// New uses JavaScript's "new" operator with value v as constructor and the given arguments.
@@ -440,6 +467,7 @@
return makeValue(res)
}
+//go:noescape
func valueNew(v ref, args []ref) (ref, bool)
func (v Value) isNumber() bool {
@@ -539,8 +567,10 @@
return string(b)
}
+//go:noescape
func valuePrepareString(v ref) (ref, int)
+//go:noescape
func valueLoadString(v ref, b []byte)
// InstanceOf reports whether v is an instance of type t according to JavaScript's instanceof operator.
@@ -551,6 +581,7 @@
return r
}
+//go:noescape
func valueInstanceOf(v ref, t ref) bool
// A ValueError occurs when a Value method is invoked on
@@ -577,6 +608,7 @@
return n
}
+//go:noescape
func copyBytesToGo(dst []byte, src ref) (int, bool)
// CopyBytesToJS copies bytes from src to dst.
@@ -591,4 +623,5 @@
return n
}
+//go:noescape
func copyBytesToJS(dst ref, src []byte) (int, bool)
I'm excited about this suggestion and change, @finnbear!
Issues: I logically came to the conclusion that this is 100% safe but it may not seem that way (in the case of re-entrant recursive calls). It is also not future proof if multiple threads are to be supported.
I believe so too: the argument slice is used only when a callback is invoked from the Go side, and this invoking should never happen recursively. @neelance what do you think?
I don't have any insights about go:noescape
, but do we really need them?
The documentation of go:noescape
says:
The //go:noescape directive must be followed by a function declaration without a body (meaning that the function has an implementation not written in Go). It specifies that the function does not allow any of the pointers passed as arguments to escape into the heap or into the values returned from the function. This information can be used during the compiler's escape analysis of Go code calling the function.
If I understand this correctly, then //go:noescape
makes no difference for func valueIndex(v ref, i int) ref
since neither ref
nor int
are pointers.
For a case like func copyBytesToGo(dst []byte, src ref) (int, bool)
it should be fine to apply //go:noescape
because the pointer of []byte
is only accessed inside of copyBytesToGo
and not stored anywhere else.
It might also apply to string
parameters.
I don't understand what's going on with x := noescape(a)
but my gut feeling is bad. 😄
Please investigate which lines exactly are responsible for moving the value to the heap.
I was still wondering if we really need //go:noescape
if 3. (using a single slice) could work. Or, am I missing something?
I'm excited about this suggestion and change, @finnbear!
Me too!
I don't have any insights about go:noescape, but do we really need them?
Yes, at the minimum for valueNew
, valueCall
, valueInvoke
, copyBytesToJS
(because those are called quite often in my use case) and take slices. Without this directive, those functions maintain the pointers of their arguments for all GC knows, meaning that those arguments (slices and contents of those slices) must be stored on the heap.
If I understand this correctly, then //go:noescape makes no difference for func valueIndex(v ref, i int) ref since neither ref nor int are pointers.
You're 100% right. I just added it to all functions on the basis that none should cause pointers to escape, but it would have no effect when there are no pointers involved. It should be applied in the minimal number of cases though, so that one should be removed.
I believe so too: the argument slice is used only when a callback is invoked from the Go side, and this invoking should never happen recursively.
To elaborate on this, I think it is safe even if the call is re-entrant. The chronology of events would look like this:
js.Global().Call("foo", 1)
makeArgs
sets global slice to [1]valueCall
loadSliceOfValues
transfers [1] to a JS copyjs.Global().Call("bar", 2)
makeArgs
sets global slice to [2]valueCall
loadSliceOfValues
transfers [2] to a JS copyI might be missing something but I don't think there is an issue with the global slice approach (as long as wasm/js
is single threaded). Also, maybe //go:nosplit
could be used to guarantee no weird edge case happens?
For a case like func copyBytesToGo(dst []byte, src ref) (int, bool) it should be fine to apply //go:noescape because the pointer of []byte is only accessed inside of copyBytesToGo and not stored anywhere else.
It might also apply to string parameters.
Yes, and yes.
I don't understand what's going on with x := noescape(a) but my gut feeling is bad. smile Please investigate which lines exactly are responsible for moving the value to the heap.
Yeah, I can see why that looks bad.
What it does: The noescape
function disassociates the parameter a
from the variable x
from an escape analysis perspective. Despite the fact that the parameter can escape to the heap (such as in the case Value
), it can only do so for the duration that it is on the caller's stack (if called from New
, Call
, or Invoke
).
How it works: XOR-ing a pointer type with 0 doesn't change the pointer, but Go can't track that the pointer is the same.
Why it's needed: First, we need a command that shows us when and why things escape in src/syscall/js
Basic 'does it escape' check: GOOS=js GOARCH=wasm go build -gcflags="-m"
Running this, it can be determined that too many variables called args
and the parameter to ValueOf
escape to the heap
Advanced 'why does it escape' check: GOOS=js GOARCH=wasm go build -gcflags="-m -m"
Running this, it can be determined that there are two ways things escape.
//go:noescape
ValueOf
. It seems like the three cases responsible for this are Value
(direct escape), Wrapper
(interface/virtual function call, that Go can't know if the pointer is escaped via), and string
(JS method stringVal
, which may or may not be fixable with //go:noescape
there)How it could be made better/more "safe":
ValueOf
is unsafe to call if it's return value is stored somewhere, so we can't let users get their hands on it. This is something I ignored for my project, since I am the only user and I know what not to do. One solution would be to create a copy of ValueOf
that lets it's parameter escape, and an unexported version that doesn't.noescape()
function could be moved into New
, Call
, and Invoke
. This would fix the above issue of needing to duplicate ValueOf
but I don't think it would work unless ValueOf
was inlined into New
, Call
, and Invoke
runtime.KeepAlive()
from the end of the internal version of ValueOf
to the end of Call
, Invoke
, and New
.I was still wondering if we really need //go:noescape if 3. (using a single slice) could work. Or, am I missing something?
There are two types of heap allocations and slice(s) are only one of them. In my original proposal, I didn't propose a way to eliminate the second type of allocations: interfaces. As long as the slices contain pointers related to interfaces, and the ValueOf
function goes unchanged, any parameters passed to Call
, Invoke
, and New
will be heap allocated. Even word-sized values (int64
, float64
, etc.) and sub-word-sized values (byte
, etc.) due to a lack of a compiler optimization (these types are common in a WebGL app). Getting rid of slice allocations is a good first and easy step, but both are issues. (Note: another possibility for slowing down interface allocations is to support pointer parameters i.e. both *float64
and float64
. This would allow the caller to maintain a buffer of interfaces of those pointers, and reassign them before passing them to Call
, Invoke
, New
, and ValueOf
. This is 'a' solution but not something I would prefer as the caller.)
I'd be happy to investigate more (how the patch could be made better, safer, and more broadly applicable) and answer other questions, as I want to get something like this into the standard library so that something as basic as a function call doesn't allocate. Also, I've deployed the patch to my game and will continue to monitor it (so far no issues, bearing in mind I know not to call the new ValueOf
function anywhere else. I already mentioned how much allocations improved, according to runtime.ReadMemStats
).
Thanks @hajimehoshi and @neelance for developing wasm/js
and considering this proposal as a way to improve it!
Okay, I think I slowly manage to wrap my head around this...
The x := noescape(a)
is a hack, I think it is very unlikely for this to go upstream as it can easily have bugs. We should rather aim for a variant of ValueOf(x interface{}) Value
with which Go can properly determine that x
does not escape. This variant should probably return ref
instead of Value
.
You have mentioned the three problematic cases:
case Value
: Returning x.ref
should easily solve this.case string
: This actually needs to allocate a new ref
which needs garbage collection. However, we can explicitly garbage collect after valueCall
so we don't need the implicit garbage collection of makeValue
and the gcPtr
field.case Wrapper
: I am not sure here. An interface value is not mutable, so maybe we can just copy it somehow? Maybe it would suffice to do y := x
and then y.JSValue()
?@neelance
The
x := noescape(a)
is a hack
You're not wrong :wink:
I think it is very unlikely for this to go upstream as it can easily have bugs
I see what you mean. However, I have read runtime
source code and seen similar hacks in performance critical segments like facilitating function calls and garbage collection. If your other idea didn't work, it might be worth consulting with other Go developers to see if the correctness of a solution can be proven despite it using a hack.
This variant should probably return ref instead of Value.
I did not consider that. Sounds good, since ref
isn't a pointer.
we can explicitly garbage collect after
valueCall
I hope you don't mean calling I think I understand what you mean: you can explicitly collect temporary refs created for a single function call (by neglecting to do this, I was leaking runtime.GC()
on every single valueCall
. That would "solve" the allocation while killing the performance.string
s :cry:)
An interface value is not mutable, so maybe we can just copy it somehow?
You're right that the interface's pointer is immutable. However, the relevant issue is where that pointer points to (heap vs off heap). GC doesn't really know if the variant of JSValue
used by an arbitrary Wrapper
type is like this:
type BadWrapper struct {
Value js.Value
}
var escapeRoute *BadWrapper
// Implements js.Wrapper
func (this *BadWrapper) JSValue() js.Value {
escapeRoute = this
return this.Value
}
so it assumes the worst. This is why a solution specific to Call
, Invoke
, and New
may be warranted where the argument is known to be on the caller's stack (unless Wrapper
was changed to return a ref
and not a Value
interface)
Side note: As far as I know, the only type I use that implements js.Wrapper
is js.Func
. If Func
were used instead of Wrapper
in the switch statement, maybe Go could figure out that the receiver is not maliciously escaping the pointer.
Maybe it would suffice to do
y := x
and theny.JSValue()
?
I doubt this will work (I think GC will see right through it) but I can test it.
I doubt this will work (I think GC will see right through it) but I can test it.
Right, it won't work. I thought the problematic pointer was the interface itself, but it is the value that the interface contains.
The only option I see is to get rid of Wrapper
...
I would be okay with the tradeoff. syscall/js
is a low level package, so performance is more important than ease of use.
@dennwc What's your take on this? You contributed http://golang.org/cl/141644 which we would need to revert.
I'm all for performance here, so feel free to revert.
That's unfortunate though, since it was quite useful to allow custom objects to be passed to native syscall/js
functions. Without it high-level libraries had to wrap every method of syscall/js
. This might be fine if performance is at stake here.
Is there any way we can solve this? E.g. make a private interface method on Wrapper
, so only the syscall/js.Value
can implement it, but still allow to embed it into custom type. Would it solve an issue?
Is there any way we can solve this? E.g. make a private interface method on
Wrapper
, so only thesyscall/js.Value
can implement it, but still allow to embed it into custom type. Would it solve an issue?
Let's test it!
As a baseline, with no syscall/js
optimizations, my game does 708 allocations per frame (I am benchmarking in a very constant state in game, so the numbers will be comparable later. Since I'm testing on a development site, I don't have access to the game state that generated 40,000 allocations per frame, but any improvement that follows should be more than proportional).
I'll start by applying //go:noescape
to stringVal
, because without that, we won't get anywhere with ValueOf
's parameter x
not escaping to the heap.
Here's the optimization output before your idea:
./js.go:162:14: parameter x leaks to {heap} with derefs=0:
./js.go:162:14: flow: {temp} = x:
./js.go:162:14: flow: x = {temp}:
./js.go:162:14: from case Wrapper: return x.JSValue() (switch case) at ./js.go:166:2
./js.go:162:14: flow: {heap} = x:
./js.go:162:14: from x.JSValue() (call parameter) at ./js.go:167:19
And here is it after simply un-exporting the Wrapper
JS value method:
./js.go:162:14: parameter x leaks to ~r1 with derefs=1:
./js.go:162:14: flow: {temp} = x:
./js.go:162:14: flow: x = *{temp}:
./js.go:162:14: from case Value: return x (switch case) at ./js.go:164:2
./js.go:162:14: flow: ~r1 = x:
./js.go:162:14: from return x (return) at ./js.go:165:3 // case Value: return x
Under my understanding, ~r1
is the return value, which is far better than the heap
. Now the question is: is this still sufficient to get the same level of allocation reduction as my earlier hack approach?
I'll continue testing by applying my makeArgs
single slice patch, to get rid of slice allocations. My game now runs at 336 allocations per frame
Now I'll //go:noescape
all JS-implemented functions that take pointers. Still 336 allocations per frame. Here's an optimization ouptut of Value.New
:
./js.go:447:20: parameter args leaks to {heap} with derefs=1:
./js.go:447:20: flow: args = args:
./js.go:447:20: from makeArgs(args) (call parameter) at ./js.go:448:30
./js.go:447:20: flow: {temp} = args:
./js.go:447:20: flow: arg = *{temp}:
./js.go:447:20: from for loop (range-deref) at ./js.go:380:16
./js.go:447:20: flow: x = arg:
./js.go:447:20: from ValueOf(arg) (call parameter) at ./js.go:381:15
./js.go:447:20: flow: {temp} = x:
./js.go:447:20: flow: x = {temp}:
./js.go:447:20: from case Wrapper: return x.jsValue() (switch case) at ./js.go:166:2
./js.go:447:20: flow: {heap} = x:
./js.go:447:20: from x.jsValue() (call parameter) at ./js.go:167:19
So maybe wrapper is a problem after all...let's try removing it entirely and putting the concrete type Func
in the switch case. Wow...only 59 allocations per frame. That's about how many allocations go on outside of syscall/js
. And we didn't even have to do a hack.
Unfortunately, unless there is something I'm missing or you are willing to use the hack method, getting rid of Wrapper
is what is required.
Here are the patches that ended up eliminating the syscall/js
allocations:
js.patch.txt
func.patch.txt
Edit: There is some weirdness going on. Although allocations are definitively at 59 per frame, the optimization output from Value.New
still says args
escapes to heap
:
./js.go:436:20: parameter args leaks to {heap} with derefs=2:
./js.go:436:20: flow: args = args:
./js.go:436:20: from makeArgs(args) (call parameter) at ./js.go:437:30
./js.go:436:20: flow: {temp} = args:
./js.go:436:20: flow: arg = *{temp}:
./js.go:436:20: from for loop (range-deref) at ./js.go:369:16
./js.go:436:20: flow: x = arg:
./js.go:436:20: from ValueOf(arg) (call parameter) at ./js.go:370:15
./js.go:436:20: flow: {temp} = x:
./js.go:436:20: flow: x = *{temp}:
./js.go:436:20: from case Func: return x.Value (switch case) at ./js.go:155:2
./js.go:436:20: flow: ~r1 = x:
./js.go:436:20: from x.Value (dot) at ./js.go:156:11
./js.go:436:20: from return x.Value (return) at ./js.go:156:3
./js.go:436:20: flow: v = ~r1:
./js.go:436:20: from v := ValueOf(arg) (assign) at ./js.go:370:5
./js.go:436:20: flow: {heap} = v:
./js.go:436:20: from argVals[i] = v (assign) at ./js.go:371:14
And Invoke
, which I call a lot, looks like this (GC also claims args
escape to heap
)
./js.go:416:23: parameter args leaks to {heap} with derefs=2:
./js.go:416:23: flow: {heap} = **args:
./js.go:416:23: from makeArgs(args) (call parameter) at ./js.go:417:30
Looks like they both "escape" via my single slice optimization in makeArgs
.
There are a few possibilities:
heap
and some don't cause excessive memory allocations@neelance @hajimehoshi @dennwc
Unfortunately, unless there is something I'm missing or you are willing to use the hack method, getting rid of Wrapper is what is required.
The "hack" is not a proper solution for upstream, right? Because as far as I can see, the hack is not a workaround to a shortcoming of the compiler. The complier is actually right that Wrapper
can escape, so circumventing this check with a hack means that it can then break at runtime.
Removing Wrapper
seems like the only way forward. I believe that this change needs to go through the proposal process. @finnbear Would you mind creating a small proposal for the removal of Wrapper
with a summary of our discussion and post it as a new issue?
The complier is actually right that Wrapper can escape
If we un-exported it, and implemented it in a way that it can't, that might not be the case. However, from what I have tried so far, even though it doesn't escape to heap, the allocations still happen somewhere in the call stack.
The "hack" is not a proper solution for upstream, right?
Not as written, no. It might be able to be adapted into a proper solution though.
Would you mind creating a small proposal for the removal of Wrapper
I would be happy to (after I spend about 30 more minutes trying different combinations of ways to keep Wrapper and eliminate allocations, just to make sure removing it is the only way)!
@neelance proposal submitted: https://github.com/golang/go/issues/44006
Note that, as part of the proposal, I propose yet another way to solve this very issue. It intentionally makes no effort to make the argument/ref slices global in any way. Here's a snippet:
makeArgs
gets replaced with:
func storeArgs(args []interface{}, argValsDst []Value, argRefsDst []ref) {
for i, arg := range args {
v := ValueOf(arg)
argValsDst[i] = v
argRefsDst[i] = v.ref
}
}
// This function must be inlined
func makeArgs(args []interface{}) (argVals []Value, argRefs []ref) {
const maxStackArgs = 16
if len(args) <= maxStackArgs {
// Allocated on the stack
argVals = make([]Value, len(args), maxStackArgs)
argRefs = make([]ref, len(args), maxStackArgs)
} else {
// Allocated on the heap
argVals = make([]Value, len(args))
argRefs = make([]ref, len(args))
}
return
}
Calls to makeArgs
get replaced with:
argVals, argRefs := makeArgs(args)
storeArgs(args, argVals, argRefs)
@finnbear I just looked into your patch some more. The makeArgs
/storeArgs
solution is really clever. 🎯
Do the //go:noescape
on valuePrepareString
and valueLoadString
make any difference?
I just looked into your patch some more. The
makeArgs
/storeArgs
solution is really clever.
Thanks! (it's just a workaround for a lack of a way to force the Go compiler to inline an arbitrarily big function)
Do the
//go:noescape
onvaluePrepareString
andvalueLoadString
make any difference?
No, those two don't actually matter. When I originally saw them, I misinterpreted them as Go string
to JS String
but they are for the reverse purpose. Unless syscall/js
added a new API to load a JS String
into a user's potentially-stack-allocated []byte
, //go:noescape
is redundant as make([]byte, length)
with length
being dynamic forces a heap allocation. This reasoning is supported by testing, as removing these //go:noescape
's had no effect on my micro benchmark (even after adding a JS String
to Go string
operation).
@neelance
@finnbear I also have a hard time producing different results with //go:noescape
on stringVal
, valueGet
, valueSet
and valueDelete
.
Is there any scenario in which a string pointer could point to the stack? Strings are immutable. If the value is a known constant at compile time, then it will be neither stack nor heap, but constant program data. If it is not known, then the length can not be known at compile time and thus it can not go to the stack and must end up on the heap. Am I missing something?
If my reasoning is true, then //go:noescape
would not make any difference for string
parameters. Maybe I should ask this on golang-dev...
Is there any scenario in which a string pointer could point to the stack?
Yes, although I admit it may not be common. //go:noescape
on valueGet
, valueSet
, and valueDelete
is the difference between the 2nd and 3rd lines (optimization output) when the first line is compiled and passed as a field name.
// Always one byte, but not a statically known byte (therefore, string is on stack)
randStr := string([]byte{byte(60 + rand.Intn(10))})
js.Global().Get(randStr)
// With noescape
./main.go:109:22: string([]byte{...}) escapes to heap
// Without noescape
./main.go:109:22: string([]byte{...}) does not escape
//go:noescape
on stringVal
also has an effect on optimizer output.
./js.go:341:32: parameter x leaks to {heap} with derefs=1:
./js.go:341:32: flow: x = x:
./js.go:341:32: from ValueOf(x) (call parameter) at ./js.go:345:15
./js.go:341:32: flow: {temp} = x:
./js.go:341:32: flow: x = *{temp}:
./js.go:341:32: from case string: return makeValue(stringVal(x)) (switch case) at ./js.go:193:2
All that being said, I could not readily reproduce a difference in reported allocations with respect to enabling //go:noescape
in any or all of these cases (although I am using my test script, not my full application). As a result, I'm updating my theory/understanding about what causes escaping to heap in a normal program. Taking a look at the Wrapper
case (removed by my proposal), we see
./js.go:348:32: parameter x leaks to {heap} with derefs=0:
./js.go:348:32: flow: x = x:
./js.go:348:32: from ValueOf(x) (call parameter) at ./js.go:352:15
./js.go:348:32: flow: {temp} = x:
./js.go:348:32: flow: x = {temp}:
./js.go:348:32: from case Wrapper: return x.JSValue() (switch case) at ./js.go:161:2
./js.go:348:32: flow: {heap} = x:
./js.go:348:32: from x.JSValue() (call parameter) at ./js.go:162:19
The thing that I now believe is most relevant is derefs=?
. It seems to me (although this is just a guess) that the derefs
value reflects the net amount of de-referencing performed on an identifier, and that having it be 0
makes a huge difference for interface{}
-based parameters. Specifically, if 0
, all values passed into the parameter (with the exception of byte
-sized ints) will escape:
// ./js.go:348:32: parameter x leaks to {heap} with derefs=0:
./main.go:76:46: rand.Intn(size) escapes to heap:
./main.go:76:46: flow: {storage for ... argument} = &{storage for rand.Intn(size)}:
./main.go:76:46: from rand.Intn(size) (spill) at ./main.go:76:46
./main.go:76:46: from ... argument (slice-literal-element) at ./main.go:76:19
./main.go:76:46: flow: {heap} = {storage for ... argument}:
./main.go:76:46: from ... argument (spill) at ./main.go:76:19
./main.go:76:46: from drawRect.Invoke(... argument...) (call parameter) at ./main.go:76:19
But if there is some non-zero amount of de-referencing, then static analysis can (at least sometimes) avoid heap allocations for some types of parameters. I think this is because the interface{}
's data pointer can still point to a structure on the stack, which could be an int
that doesn't escape in any way or a struct
that contains a potentially-escaping pointer.
// ./js.go:348:32: parameter x leaks to {heap} with derefs=1:
./main.go:76:78: inlining call to rand.Intn func(int) int { return rand.globalRand.Intn(rand.n) }
So, in summary:
derefs=0
, and an thus an int
passed to the method will leak. However, in the stringVal
example, x
leaks with derefs=1
and thus an int
passed to the method will not leak.Wrapper
was required, but removing all "heap" escaping of x
may not beFinally, here's a suggestion. Either:
//go:noescape
wherever it is both correct and it is theoretically consequential or has any effect on optimizer output (run GOOS=js GOARCH=wasm go build -gcflags "-m -m"
in src/syscall/js
). This will give the Go compiler (current and future versions of it) maximum flexibility to optimize. If needed, document the fact that some of the //go:noescape
's matter more than others.derefs=?
in the context of escape analysis, and go from there.I'm not sure this is related but can handleEvent
also reuse slices to avoid slice allocations?
args := make([]Value, argsObj.Length())
for i := range args {
args[i] = argsObj.Index(i)
}
result := f(this, args)
EDIT: The patch will be like this:
@@ -68,6 +68,8 @@
setEventHandler(handleEvent)
}
+var eventArgVals = make([]Value, 0, maxArgs)
+
func handleEvent() {
cb := jsGo.Get("_pendingEvent")
if cb.IsNull() {
@@ -89,10 +91,14 @@
this := cb.Get("this")
argsObj := cb.Get("args")
- args := make([]Value, argsObj.Length())
- for i := range args {
- args[i] = argsObj.Index(i)
+ eventArgVals = eventArgVals[:argsObj.Length()]
+ for i := range eventArgVals {
+ eventArgVals[i] = argsObj.Index(i)
}
- result := f(this, args)
+ result := f(this, eventArgVals)
cb.Set("result", result)
+
+ for i := range eventArgVals {
+ eventArgVals[i] = Value{}
+ }
}
can
handleEvent
also reuse slices to avoid slice allocations?
This can be dangerous if handleEvent
is re-entrant (it theoretically can be, if the Go code causes a JavaScript immediate event: For example, if the Go code calls .click()
on a <button>
that has an event handler that runs Go again. Some browsers will synchronously invoke that handler, instead of putting it on the event queue. If handleEvent
handles all JS->Go callbacks, that's an even bigger issue).
A safer solution would be more in line with makeArgs
/storeArgs
, where each call gets it's own slice, but slices below a certain length go on the stack (without allocating).
Ah OK, thanks. My understanding of your latest proposal is that if the capacity is a small constant value (16 in this case), the slice allocation will be on the stack. Is this correct? Does 16 depend on the current implementation?
My understanding of your latest proposal is that if the capacity is a small constant value (16 in this case), the slice allocation will be on the stack. Is this correct?
Yes, as written, slices with len <= 16
will be allocated on the stack with a slice of cap = 16
. Otherwise, the slice will be on the heap with an arbitrary len
and cap >= len
.
Note: Despite being reliably implemented, in my understanding, this optimization is not guaranteed by the Go specifications.
Does 16 depend on the current implementation?
The reason I chose 16 is it 1) is a power of two 2) is a realistic upper bound for number of arguments. If it were to be increased to 32, that might make some very rare functions allocate less, but would penalize much more common functions. If it were decreased to 4, many functions would need to allocate. I don't think 8 would be too bad, but it would affect https://developer.mozilla.org/en-US/docs/Web/API/WebGLRenderingContext/texImage2D.
Note: There may be a number greater than 16 that would cause the Go compiler to not put the slice on the stack. I didn't attempt to find it.
Another question: how can we make sure makeArgs
is inlined?
This can be dangerous if handleEvent is re-entrant (it theoretically can be, if the Go code causes a JavaScript immediate event:
What about making a global stack?
var eventArgValsStack []Value
func getEventArgs(argsObj Value) (argVals []Value) {
l := argsObj.Length()
if cap(eventArgValsStack)-len(eventArgValsStack) < l {
eventArgValsStack = append(eventArgValsStack, make([]Value, l)...)
} else {
eventArgValsStack = eventArgValsStack[:len(eventArgValsStack)+l]
}
argVals = eventArgValsStack[len(eventArgValsStack)-l:]
for i := range argVals {
argVals[i] = argsObj.Index(i)
}
return
}
func putEventArgs(n int) {
for i := 0; i < n; i++ {
eventArgValsStack[len(eventArgValsStack)-i-1] = Value{}
}
eventArgValsStack = eventArgValsStack[:len(eventArgValsStack)-n]
}
func handleEvent() {
cb := jsGo.Get("_pendingEvent")
if cb.IsNull() {
return
}
jsGo.Set("_pendingEvent", Null())
id := uint32(cb.Get("id").Int())
if id == 0 { // zero indicates deadlock
select {}
}
funcsMu.Lock()
f, ok := funcs[id]
funcsMu.Unlock()
if !ok {
Global().Get("console").Call("error", "call to released function")
return
}
this := cb.Get("this")
argsObj := cb.Get("args")
const maxStackArgs = 16
args := getEventArgs(argsObj)
defer putEventArgs(argsObj.Length())
result := f(this, args)
cb.Set("result", result)
}
I tried your latest suggestion (using a constant capacity whenever possible) but apparently the slice was allocated on heap... (I've confirmed this by watching runtime.mallocgc
and calling throw
at some timing)
Another question: how can we make sure
makeArgs
is inlined?
In my understanding, there is no way to force Go to inline something other than in-lining it manually. If you compile with gcflags -m -m
(or something similar), it will tell you the reason for its decision (usually related to the "cost" of the function). I split up makeArgs
/storeArgs
so that makeArgs
would be under the "cost" threshold for inlining at the time of writing.
I tried your latest suggestion (using a constant capacity whenever possible) but apparently the slice was allocated on heap...
I should have realized this, but the args
slice (in the context of handleEvent
) will always escape to the heap because f
is dynamically dispatched. Therefore I don't think the Go compiler will ever allow it to reside on the stack.
What about making a global stack?
Your "global stack" solution is really clever (if not obviously correct, so I'm assuming it works when you test it). I think it would solve the allocation and re-entrance problem if the client code can be trusted to not mutate the args slice, which could corrupt the arguments of other calls.
Thanks!
if the client code can be trusted to not mutate the args slice, which could corrupt the arguments of other calls.
How does a function corrupt the args slice for other functions (its superiors and/or its subsidies)?
How does a function corrupts the args slice for other functions (its superiors and/or its subsidies)?
I was thinking if Go func1
with args slice [ptr=1, 2, 3] induces a JS event to call Go func2
. with args [1, 2, 3, ptr=4, 5, 6]. If func1
was to call append
before func2
finished executing, that could overwrite 4, 5, and 6, causing a problem. If func2
is synchronous, this should be impossible, though.
Actually, the root problem is that no Go func
would be able keep (in a new goroutine)/store (on globally/heap) its args slice after it returns (which would work fine without this optimization).
I'm not too worried about this, but it should be documented to warn users.
If func1 was to call append before func2 finished executing, that could overwrite 4, 5, and 6, causing a problem. If func2 is synchronous, this should be impossible, though.
Ah, with goroutines? Yeah, that might be in theory possible... Hmm...
Actually, the root problem is that no Go func would be able keep (in a new goroutine)/store (on globally/heap) its args slice after it returns (which would work fine without this optimization).
Yeah that's a good point.
I'm not too worried about this, but it should be documented to warn users.
I think it should be fine just to warn users not to corrupt the argument slices.
@neelance What do you think about my 'global stack' proposal? This idea should work for both makeArgs
and handleEvent
. Fortunately, syscall/js
is out of the scope of the Go 1 compatibility policy.
I think it should be fine just to warn users not to corrupt the argument slices.
I like the warning idea, but that particular warning not sufficient to avoid issues. Not only can you not mutate beyond the len
the slices but you cannot store them (heap/global) them or use them in a new goroutine. The gouroutine restriction, which I believe will be the most common pitfall, is reminiscent of one of the constraints on using an iterator variable like:
items := []string{"first", "second", "third"}
for _, item := range items {
go func() {
println(item)
}()
}
// Output:
// third
// third
// third
Hm, that's unfortunate... Thank you for pointing this out.
BTW, I think my idea can be applied for makeArgs
safely as the slice is not used in Go side. Is that correct?
BTW, I think my idea can be applied for makeArgs safely as the slice is not used in Go side. Is that correct?
I think that would work, but consider that...
1) I don't think that case suffers from re-entrance issues (even if it was re-entrant, the args would have already been copied out of the memory on the JS side), so you might even be able to get away with a single shared slice like in my original comment in this issue. (see https://github.com/golang/go/issues/39740#issuecomment-766938606)
2) The slice doesn't escape the stack, so the the makeArgs
/storeArgs
does work (at the cost of a tiny bit of stack memory)
3) makeArgs
/storeArgs
should easily support multithreaded WASM. Global slice/global stack will need a Mutex.
I don't think that case suffers from re-entrance issues (even if it was re-entrant, the args would have already been copied out of the memory on the JS side), so you might even be able to get away with a single shared slice like in my original comment in this issue.
It this is re-entrant, the Value slice must be kept in Go side not to be GCed until the call finishes, right? Anyway as long as this is not re-entrant, yes, your 'global slice' idea works.
The slice doesn't escape the stack, so the the makeArgs/storeArgs does work (at the cost of a tiny bit of stack memory)
Hm? We have found that the slice was not on the stack due to dynamic dispatching, right?
Hm? We have found that the slice was not on the stack due to dynamic dispatching, right?
Oh sorry, my entire previous comment was in the context of Call
, New
, Invoke
, and makeArgs
(no dynamic dispatch) as I thought that was what you were referring to. In the context of handleEvent
and (dynamically dispatched) f
, none of what I just said is true.
Oh sorry, my entire previous comment was in the context of Call, New, Invoke, and makeArgs (no dynamic dispatch) as I thought that was what you were referring to. In the context of handleEvent and f, none of what I just said is true.
Sorry for my confusing comment at https://github.com/golang/go/issues/39740#issuecomment-955193617, but I meant makeArgs
/storeArgs
didn't work for both Call
/New
/Invoke
and handleEvent
. My understanding is that the dynamic-dispatching is
if len(args) <= maxStackArgs {
// Allocated on the stack
argVals = make([]Value, len(args), maxStackArgs)
argRefs = make([]ref, len(args), maxStackArgs)
} else {
// Allocated on the heap
argVals = make([]Value, len(args))
argRefs = make([]ref, len(args))
}
and unfortunately the slices are not on the stack for both cases. I'll double-check this later.
the dynamic-dispatching is [if/else]
Which branch is taken is dynamic, but there is no dispatch (calling a function) here. All the code there is known statically and, crucially, inlined into the caller. There is no rule that says a given variable (e.g. argVals
) must always point to the stack or always point to the heap. If the Go compiler determines that it is safe to put a certain structure on the stack (e.g. the data for make([]Value, len(args), maxStackArgs)
), it is perfectly fine to assign a heap allocated structure (e.g. the data for make([]Value, len(args))
) to the same variable.
Side note: The reason it is safe is that Go can trace the flow of the argVals
/argRefs
slices through their entire lifetime, and they are only ever passed down (as long as makeArgs
is inlined) the callstack (eventually to JS). They never have an opportunity to interact with arbitrary user code.
unfortunately the slices are not on the stack for both cases
That's not what I'm seeing. It's possible that the Go compiler version matters, though, so I include my version info.
func makeArgs(args []interface{}) (argVals []Value, argRefs []ref) {
const maxStackArgs = 16
if len(args) <= maxStackArgs {
// Allocated on the stack
argVals = make([]Value, len(args), maxStackArgs)
argRefs = make([]ref, len(args), maxStackArgs)
} else {
// Allocated on the heap
argVals = make([]Value, len(args))
argRefs = make([]ref, len(args))
}
return
}
func (v Value) Call(m string, args ...interface{}) Value {
argVals, argRefs := makeArgs(args) // 394
storeArgs(args, argVals, argRefs)
res, ok := valueCall(v.ref, m, argRefs)
...
}
$ go version
go version go1.16.2 linux/amd64
$ cd go1.16beta1/src/syscall/js # this just happens to be where I developed the patch
$ GOOS=js GOARCH=wasm go build -gcflags "-m -m"
...
./js.go:394:30: inlining call to makeArgs <---- IMPORTANT
...
./js.go:394:30: make([]Value, len(args), maxStackArgs) does not escape <------ HERE
./js.go:394:30: make([]ref, len(args), maxStackArgs) does not escape
./js.go:394:30: make([]Value, len(args)) escapes to heap
./js.go:394:30: make([]ref, len(args)) escapes to heap
...
and handleEvent
Here, we actually see dynamic dispatch in action (Normally, the term dynamic dispatch is used in the context of polymorphism, but I just mean calling a function known only at runtime).
f, ok := funcs[id]
...
result := f(this, args)
As far as the Go compiler is concerned, f
could take args
and store it on the heap and/or in global variables, that outlive the current function's stack frame. Therefore, the data backing args
must not be on the stack.
I just want to leave a comment that my 'global stack' suggestion might not work with multiple goroutines (w/o mutex).
That's not what I'm seeing. It's possible that the Go compiler version matters, though, so I include my version info.
OK I'll double-check this today... Thanks! (Sorry for my terribly late comment)
@finnbear
.../js.patched.go:412:17: make([]Value, len(args), maxStackArgs) escapes to heap
.../js.patched.go:413:17: make([]ref, len(args), maxStackArgs) escapes to heap
Unfortunately, with Go 1.17.2 (darwin), these values seem to escape.
EDIT: makeArgs
is inlined like inlining call to makeArgs
With -m -m
:
.../js.patched.go:417:17: make([]Value, len(args)) escapes to heap:
.../js.patched.go:417:17: flow: {heap} = &{storage for make([]Value, len(args))}:
.../js.patched.go:417:17: from make([]Value, len(args)) (non-constant size) at .../js.patched.go:417:17
.../js.patched.go:418:17: make([]ref, len(args)) escapes to heap:
.../js.patched.go:418:17: flow: {heap} = &{storage for make([]ref, len(args))}:
.../js.patched.go:418:17: from make([]ref, len(args)) (non-constant size) at .../js.patched.go:418:17
.../js.patched.go:414:17: make([]ref, len(args), maxStackArgs) escapes to heap:
.../js.patched.go:414:17: flow: argRefs = &{storage for make([]ref, len(args), maxStackArgs)}:
.../js.patched.go:414:17: from make([]ref, len(args), maxStackArgs) (spill) at .../js.patched.go:414:17
.../js.patched.go:414:17: from argRefs = make([]ref, len(args), maxStackArgs) (assign) at .../js.patched.go:414:11
.../js.patched.go:413:17: make([]Value, len(args), maxStackArgs) escapes to heap:
.../js.patched.go:413:17: flow: argVals = &{storage for make([]Value, len(args), maxStackArgs)}:
.../js.patched.go:413:17: from make([]Value, len(args), maxStackArgs) (spill) at .../js.patched.go:413:17
.../js.patched.go:413:17: from argVals = make([]Value, len(args), maxStackArgs) (assign) at .../js.patched.go:413:11
@finnbear
.../js.patched.go:412:17: make([]Value, len(args), maxStackArgs) escapes to heap .../js.patched.go:413:17: make([]ref, len(args), maxStackArgs) escapes to heap
Unfortunately, with Go 1.17.2 (darwin), these values seem to escape.
EDIT:
makeArgs
is inlined likeinlining call to makeArgs
I'd like to investigate this, but just so we are on the same page, can you ZIP and attach your entire src/syscall/js
directory? That way, I can figure out if the issue is a difference in code or a difference in compiler version. it also makes it possible to compare line numbers with your compiler output (I see the escape reason "spill" which shouldn't be happening).
I'm using -overlay
so I replaced only js.go
. I'll send it to you soon.
@finnbear Here you are js.go.zip
By inlining makeArgs
by hand, both argVals
and argRefs
are on the stack (of course this also requires //go:noescape
to valueCall
and so on)
I downloaded go1.17.2
and inserted your js.go
, and I think I can see the misunderstanding.
makeArgs
(which is fine, since makeArgs
is never called on its own)makeArgs
is (automatically) inlined into Call
, New
, or Invoke
, the cap=16 slices do not escape from Call
's, New
's, or Invoke
's stack frames.As you can see below, the first two lines of output are what you observed, but the remaining lines tell the full story.
~/sdk/go1.17.2/src/syscall/js$ GOOS=js GOARCH=wasm ~/go/bin/go1.17.2 build -gcflags "-m -m"
...
# line 412: argVals = make([]Value, len(args), maxStackArgs) in makeArgs
./js.go:412:17: make([]Value, len(args), maxStackArgs) escapes to heap
# line 413: argRefs = make([]ref, len(args), maxStackArgs) in makeArgs
./js.go:413:17: make([]ref, len(args), maxStackArgs) escapes to heap
# line 416 argVals = make([]Value, len(args)) in makeArgs
./js.go:416:17: make([]Value, len(args)) escapes to heap
# line 417 argRefs = make([]ref, len(args)) in makeArgs
./js.go:417:17: make([]ref, len(args)) escapes to heap
...
# line 434: argVals, argRefs := makeArgs(args) in Call
./js.go:434:30: make([]Value, len(args), maxStackArgs) does not escape
./js.go:434:30: make([]ref, len(args), maxStackArgs) does not escape
./js.go:434:30: make([]Value, len(args)) escapes to heap
./js.go:434:30: make([]ref, len(args)) escapes to heap
...
# line 459: argVals, argRefs := makeArgs(args) in Invoke
./js.go:459:30: make([]Value, len(args), maxStackArgs) does not escape
./js.go:459:30: make([]ref, len(args), maxStackArgs) does not escape
./js.go:459:30: make([]Value, len(args)) escapes to heap
./js.go:459:30: make([]ref, len(args)) escapes to heap
...
# line 481: argVals, argRefs := makeArgs(args) in New
./js.go:481:30: make([]Value, len(args), maxStackArgs) does not escape
./js.go:481:30: make([]ref, len(args), maxStackArgs) does not escape
./js.go:481:30: make([]Value, len(args)) escapes to heap
./js.go:481:30: make([]ref, len(args)) escapes to heap
By inlining makeArgs by hand, both argVals and argRefs are on the stack (of course this also requires //go:noescape to valueCall and so on)
Manually inlining will disambiguate the optimizer output, but is not necessary, as the automatically-inlined versions (In Call
, New
, and Invoke
) are the only ones that are ever called.
@hajimehoshi
./js.go:434:30: make([]Value, len(args), maxStackArgs) does not escape
./js.go:434:30: make([]ref, len(args), maxStackArgs) does not escape
./js.go:434:30: make([]Value, len(args)) escapes to heap
./js.go:434:30: make([]ref, len(args)) escapes to heap
I've confirmed this and so on. Thanks!
I think that'd be very nice to have this change in the official syscall/js
. Do you have a plan to send a change?
@finnbear
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, in go1.15beta1
What did you do?
I used
js.Call
,js.New
, andjs.Invoke
with multiple arguments, hundreds of times per frame, 60 frames per second in order to implement my WebGL game.What did you expect to see?
The above functions are absolutely essential for getting work done in js/wasm, as they are analogous to a simple function call. They should be optimized as such. No excess go heap memory should be allocated, no extra go garbage collection. After implementing a hack to fix the issue I will go on to describe (see below), here is a CPU profile of memory allocation:
Importantly, memory allocation is 8% of the total time and does not include
makeArgs
allocating any slices that end up on the heapWhat did you see instead?
2+ new slices allocated on the heap per call to one of the above three functions, as the first few lines of the makeArgs function... https://github.com/golang/go/blob/60f78765022a59725121d3b800268adffe78bde3/src/syscall/js/js.go#L361-L363
...possibly are assumed to escape to the heap via... https://github.com/golang/go/blob/60f78765022a59725121d3b800268adffe78bde3/src/syscall/js/js.go#L405-L406 ...or
valueNew
/valueInvoke
, or if go's escape analysis optimization doesn't work on the caller's stack.A CPU profile shows the significant penalty associated with allocating too many slices (mallocgc now accounts for over 20%), and that doesn't even include the extra garbage collector load that claims a few FPS.
I thought of adding
//go:noescape
before each of the above and potentially other functions implemented in javascript, but I didn't get any improvement right away. Only my hack tomakeArgs
worked consistently:Another potential solution, building on the above hack, would be to make
makeArgs
take slices, to put the values/refs in, as arguments (instead of returning new slices). The caller could deal with efficiently handling the memory. Maybe this could help go's escape analysis, in conjunction with//go:noescape
.Side note: Ideally, a solution would address the heap allocation of an interface for certain arguments. Hundreds of thousands get allocated before the garbage collector runs, causing the garbage collector to take 12-20ms.