Closed IgorMinar closed 1 year ago
The issue is very easy to repro when load testing the local server with hey
:
$ hey -n 1000 http://127.0.0.1:8787/ (master|โฆ6)
Summary:
Total: 0.8201 secs
Slowest: 0.7063 secs
Fastest: 0.0024 secs
Average: 0.0334 secs
Requests/sec: 1219.3387
$ hey -n 1000 http://127.0.0.1:8787/ (master|โฆ6)
Summary:
Total: 5.1450 secs
Slowest: 4.2399 secs
Fastest: 0.0153 secs
Average: 0.2053 secs
Requests/sec: 194.3653
$ hey -n 1000 http://127.0.0.1:8787/ (master|โฆ6)
Summary:
Total: 17.4745 secs
Slowest: 16.9587 secs
Fastest: 0.0227 secs
Average: 0.8165 secs
Requests/sec: 57.2262
$ hey -n 1000 http://127.0.0.1:8787/ (master|โฆ6)
Summary:
Total: 32.9817 secs
Slowest: 2.6376 secs
Fastest: 0.0976 secs
Average: 1.6490 secs
Requests/sec: 30.3199
Interestingly, I occasionally see the workerd crash with the following:
workerd/server/server.c++:967: info: Inspector client attaching [core:user:workers-hot]
workerd/io/worker.c++:999: info: console warning; message = There was an error while reporting an unhandled promise rejection: RangeError: Maximum call stack size exceeded
*** Received signal #10: Bus error: 10
stack:
Could you post the script of your worker, and your wrangler.toml
, if possible?
Hey Samuel, apologies for late response. This was a simple "hello world" worker with no bindings or special config.
Hey @IgorMinar! I've just tried to reproduce this with the latest wrangler@beta
, and I'm not seeing the response slowdown over time anymore, even with large -n
. Would you be able to try out npx wrangler@beta dev --experimental-local
? Memory usage does creep up (not as high as you were seeing), but this might just be slow GC, as it usually comes down after a while.
That being said, I do still get *** Received signal #10: Bus error: 10 stack:
(indicating memory access out of process address space) if I try -n 10000
a few times, so there's still something strange happening here. Notably, this also happens when you run workerd serve samples/helloworld/config.capnp
in the workerd
repo, so I don't think this is a Wrangler/Miniflare problem.
Iโll retest it but it will likely take a few days before I can get around to this. Thanks for the heads up though, looks like a good progress was made on this upstream already.
On Tue, Feb 21, 2023 at 7:59 AM MrBBot @.***> wrote:
Hey @IgorMinar https://github.com/IgorMinar! I've just tried to reproduce this with the latest @., and I'm not seeing the response slowdown over time anymore, even with large -n. Would you be able to try out npx @. dev --experimental-local? Memory usage does creep up (not as high as you were seeing), but this might just be slow GC, as it usually comes down after a while.
That being said, I do still get *** Received signal #10: Bus error: 10 stack: (indicating memory access out of process address space) if I try -n 10000 a few times, so there's still something strange happening here. Notably, this also happens when you run workerd serve samples/helloworld/config.capnp in the workerd repo, so I don't think this is a Wrangler/Miniflare problem.
โ Reply to this email directly, view it on GitHub https://github.com/cloudflare/workers-sdk/issues/2386#issuecomment-1438724497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUZ2E4MZOY26VPFG4WUZ3WYTQ4VANCNFSM6AAAAAAS2NPN5U . You are receiving this because you were mentioned.Message ID: @.***>
@IgorMinar were you able to confirm whether this is still a problem?
oops.. I dropped the ball on this one. I'll retest today. My apologies.
hey @IgorMinar! :) just bumping this -- did you have a chance to retest?
I was not able to verify/test because I got blocked on https://github.com/cloudflare/workers-sdk/issues/3007 which seems to suggest that --experimental-local
is currently broken in the 2.14.0 release for some if not for all users.
Getting a similar issue on the latest wrangler. Don't know if it "slows down" but memory usage quickly goes to 10->20->30gb and it doesn't come back down unless killed.
Can report this is still the case in wrangler
3.0.1
on MacOS.
Note that this does not require the wrangler server to be used at all. It will leak just by starting it.
Command:
wrangler dev --port 2222
I also get this error after some time; I'm not sure if it's related to the leak.
Failed to get worker definitions TypeError: fetch failed
at fetch (/Users/a/webaverse/node_modules/wrangler/wrangler-dist/cli.js:16584:17)
at async getRegisteredWorkers
(/Users/a/webaverse/node_modules/wrangler/wrangler-dist/cli.js:124280:22)
at async getBoundRegisteredWorkers
(/Users/a/webaverse/node_modules/wrangler/wrangler-dist/cli.js:124299:29) {
cause: HeadersTimeoutError: Headers Timeout Error
at Timeout.onParserTimeout [as callback]
(/Users/a/webaverse/node_modules/wrangler/wrangler-dist/cli.js:8431:33)
at Timeout.onTimeout [as _onTimeout]
(/Users/a/webaverse/node_modules/wrangler/wrangler-dist/cli.js:681:17)
at listOnTimeout (node:internal/timers:573:17)
at process.processTimers (node:internal/timers:514:7) {
code: 'UND_ERR_HEADERS_TIMEOUT'
}
}
I'd like to double down on this issue, as it's currently blocking our effort to use wrangler dev in a local development / integration testing environment.
3.1.0 MacOS
For measurement I used
hey -c 1 -n 10000 http://127.0.0.1:8787/
Bare minimum worker, no deps, no bindings
export default {
async fetch(request: Request, env: any, executionCtx: ExecutionContext) {
return new Response("Hello", {status: 200})
},
}
Observed Very progressive slowdown, and in the end crash
[mf:inf] GET / 200 OK (18ms)
[mf:inf] GET / 200 OK (20ms)
[mf:inf] GET / 200 OK (20ms)
[mf:inf] GET / 200 OK (20ms)
*** Received signal #10: Bus error: 10
stack:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ [b] open a browser, [d] open Devtools, [l] turn off local mode, [c] clear console, [x] to exit โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Did a control measurement with Service Worker format vs Module worker
addEventListener('fetch', (event) => {
event.respondWith(new Response("Hello", {status: 200}))
})
_The worker performs a lot better with wrangler dev
when in the Service Worker format, but crashes all the same with the same error, just later (after way more requests)_
I'll perform an experiment with emulated amd64 arch a bit later
Hopefully this will help in some small way, but I've had issues with the slowdown and the bus 10 errors as well.
Started at 3.0.1, and now on 3.1.0 on MAC
Both gave me a Bus error: 10
message along with massive slow-downs out the gate.
I used this command, softwareupdate --install "Command Line Tools for Xcode-14.3"
, and that seemed to get rid of the bus 10 error. The speed was still abysmal, taking a couple of minutes to load in vs the fractions of a second that it takes on prod.
Command used:
wrangler dev --log-level debug --ip 0.0.0.0 --port 3000 --env development
I did, however, get a different error than the bus 10 error after the CLI XCode update:
workerd/jsg/util.c++:278: info: exception = kj/async-io-unix.c++:778: disconnected: ::writev(fd, iov.begin(), iov.size()): Broken pipe; iovTotal = 1108; iov.size() = 1
stack: 10210f62f 1021100d3 10124d3cc 1020e7df8 1020ec88c 1020e37a8 1020f2c54 1009f95e8 1020f16d0 100df1130
workerd/io/worker.c++:1703: info: uncaught exception; source = Uncaught (in promise); exception = Error: Network connection lost.
workerd/io/worker.c++:1703: info: uncaught exception; source = Uncaught (in response); exception = Error: Network connection lost.
workerd/jsg/util.c++:278: info: exception = kj/async-io-unix.c++:778: disconnected: ::writev(fd, iov.begin(), iov.size()): Broken pipe; iovTotal = 1108; iov.size() = 1
stack: 10210f62f 1021100d3 10124d3cc 1020e7df8 1020e802f 1020d349c 1020e7df8 1020e802f 1020d349c 1020e7df8 1020e802f 1020d349c 1020f9314 1020f16d0 100f77494 1009f95e8 100e9e7c0 100dd8868 100eaf900 100d40c08 100df01c8 100b074b4
workerd/io/worker.c++:1703: info: uncaught exception; source = Uncaught (in promise); exception = Error: Network connection lost.
workerd/io/worker.c++:1703: info: uncaught exception; source = Uncaught (async); exception = Error: Network connection lost.
โ [ERROR] Uncaught (in promise) Error: Network connection lost.
โ [ERROR] Uncaught (async) Error: Network connection lost.
For additional context, the application runs as expected on the miniflare 2.X versions.
Command used with miniflare:
miniflare --live-reload --debug --port 3000 --env .env --modules <script path>
I hope this isn't just adding static
I can confirm what @Cragady reported:
helloworld_esm
worker from workerd
github repo samplesHowever:
node_modules
that wrangler dev
was running, even tried to match the command line as much as I can, enable inspector and such)
This way I could not reproduce any slowdown (the bus 10 error occured on OSX, but that was indeed just noise and not seen it when switching to Docker)IMO the main issue here is the slowdown and outrageous memory pressure accompanying it
We were looking into the signal #10: Bus error
in workerd yesterday.
This appears to be limited to workerd on OS X and only for optimized builds (bazel build -c opt
). It hasn't failed this way with fastbuild or dbg builds.
xcodebuild -version
reports Xcode 14.3.1 Build version 14E300c
.
The good news is that we have a stack trace and can pinpoint where the failure occurs, the bad news is that it's an optimized build and so there's only partial symbols along with registers and memory. We'll figure something out :-)
With the slowdown and memory pressure issue, you can capture heap snapshots with devtools by enabling the inspector in workerd, e.g. workerd serve -i0.0.0.0 Memory
tab and take snapshot
. You can keep capturing heap snapshots between runs of hey
to see what is growing.
The heap looks stable across batches of 10000 requests from hey using the workerd helloworld_esm
sample on OS X M1 and Linux X64 (opt
and dbg
) with both the last workerd release and tip-of-tree. The overall process size does not grow without bound. This check is probably too brief, but might be exposed with a config and scripts that exercise more functionality.
Having the same issue today.
Hi @lrapoport-cf, the fix to workerd is in and available for the next workers-sdk release. We can close this ticket when that happens. (I've unassigned the ticket trying to re-assign it, but don't appear to have permissions to do that).
thanks @ohodson ! i've linked your fix to the ticket and reassigned :)
Looking at a merged PR in the comments above, it seems like there is a fix ๐ Do we have an expect date for this to be released?
Hey everyone! ๐ We've just released a new beta
version of Wrangler that includes a fix in workerd
for this issue. Try it out with npx wrangler@beta dev
. Please let us know if you encounter any issues. ๐
@mrbbot wrangler@beta
seems to fix the signal #10: Bus error
but we still get 500s due to the maximum call stack size being exceeded after about ~2.5 minutes of constant requests in our reproduction repo. The response times also climb from ~50ms at the start of the stress test to about ~350ms by the time 500s appear.
@BillBrower-Shopify thanks for letting us know. Will see if we can translate this to a workerd
-only reproduction. ๐
@BillBrower-Shopify I've been able to reproduce this just using workerd
and handed it off to the runtime team. I'm OOO all of next week so won't be able to respond for a bit.
Thanks @mrbbot! Is there a separate Github issue we can use to track progress on that work?
Can confirm wrangler@beta
seems to fix it. Before beta
I was using ^3.1.1
and kept getting the same error
@BillBrower-Shopify I've filed https://github.com/cloudflare/workerd/issues/831 against workerd
to investigate the 500
error that you are seeing. Thanks.
@BillBrower-Shopify the maximum call stack exceeded error and slowdowns should've been resolved with #3585, and released in wrangler@3.2.0
. That seems to have fixed the issue with your reproduction, but let us know if you have any other problems. ๐
What version of
Wrangler
are you using?2.6.2
What operating system are you using?
Mac
Describe the Bug
If I test my simple worker with
wrangler dev --experimental-local
and monitor the browser devtools network tab I see that after the initial (cold) requests then next requests return in 2ms.If I keep on refreshing the browser, I see that time slowly starts to creep up, 4ms, 6ms, 8ms, and if I keep on refreshing the browser tab a lot, it slows down to 50-70ms and keeps on slowing down further with additional http requests.
At the same time I observe that the memory usage increases from 17MB to 34MB, 50MB, 75MB, all the way to over 1GB.