Closed ostenbom closed 3 months ago
Hi @ostenbom thanks for the issue š and thanks a lot for the minimal reproduction! that's always extremely helpful and appreciated š
Unfortunately I am not really able to reproduce the issue š I tried reproducing it following your repro instructions and that didn't work for me, I've also tweaked your playwright test in the following way:
import { test, expect, Page } from "@playwright/test";
const numOfPages = 50;
test(`open ${numOfPages} tabs and navigate to pages in parallel`, async ({
browser,
}) => {
// Create a new browser context
const context = await browser.newContext();
// Open three new pages (tabs)
const pages: Page[] = await Promise.all(
new Array(numOfPages).fill(null).map(() => context.newPage())
);
// URLs to navigate to
const url1 = "http://localhost:8788/thing1";
const url2 = "http://localhost:8788/thing2";
// // Navigate to the pages in parallel
await Promise.all(
pages.map((page, i) => page.goto(i % 2 === 0 ? url2 : url1))
);
// Optional: Add assertions to verify the pages have loaded correctly
for (const page of pages) {
await expect(page.getByText("Hello, World!")).toBeVisible();
}
// Close the context
await context.close();
});
(so opening 50 tabs in parallel instead of 3) and I still could not reproduce the issue š
I am guessing that this could be machine/OS dependent.... but I figured I'd check with you, what do you think?
Thanks for taking a look @dario-piotrowicz!
We deployed the app to https://my-parallel-app.pages.dev
and pushed another commit to make it easier to test against that:
+ const host = "https://my-parallel-app.pages.dev";
+ // const host = "http://localhost:8788";
// URLs to navigate to
- const url1 = "http://localhost:8788/thing1";
- const url2 = "http://localhost:8788/thing2";
- const url3 = "http://localhost:8788/thing1";
+ const url1 = `${host}/thing1`;
+ const url2 = `${host}/thing2`;
+ const url3 = `${host}/thing1`;
Strange that it didn't show up in your preview - we're running the previews on mac machines if that helps.
Can you get the same behaviour from the production deployment? We're getting the behaviour consistently across machines and seeing it in our production logs. It's got to be a parallelism issue and then the timings can differ - so starting more browsers could be making the requests further away from each other?
so starting more browsers could be making the requests further away from each other?
I mean, I added more tabs just because I wasn't hitting the issue with only 3 š
I did try again with 3 tabs and also your production deployment, and still nothing, if I increase significantly the number of tabs I do get some failures (sometimes):
I imagine these are the ones you were expecting me to see? But I guess I won't see the context error logged since this is happening in your production application right? š
Strange that it didn't show up in your preview - we're running the previews on mac machines if that helps.
š¤· š I'm on a macbook M1 pro
But I guess I won't see the context error logged since this is happening in your production application right? š
That's exactly the behaviour! If you run with --headed
, what you will see is "Internal Server Error". Which isn't much help without the logs š
This is a straight deployment of the repository using the npm run deploy
command there - if you want to see the logs / errors you should hopefully be able to deploy your own copy?
But I guess I won't see the context error logged since this is happening in your production application right? š
That's exactly the behaviour! If you run with
--headed
, what you will see is "Internal Server Error". Which isn't much help without the logs šThis is a straight deployment of the repository using the
npm run deploy
command there - if you want to see the logs / errors you should hopefully be able to deploy your own copy?
I see š
I wonder why it takes that many parallel tabs on my end š
Yes I can deploy my own copy.... but it's not really going to help much debugging this š I wish I had a way to locally reproduce this instead! š
Anyways thanks a lot for the repro and patience here š , I'll try to see what I can do on my end š
š
@ostenbom I've dug into the issue and I am quite sure that this is a workerd issue, in #819 I've implemented a workaround which should hopefully work quite well, I'll also check with the workerd team to see if things can be improved on their end (making my workaround in the future hopefully unnecessary).
If you want please check out the PR's prerelease and let me know if this works well for you š (it definitely seems to be fixing the issue for me when testing it with your minimal reproduction).
Thank you @dario-piotrowicz! This has been a real head scratcher for us but works with both the reproduction and our own internal tests! ā ā
@ostenbom nice! I'm glad it solved your issue :smiley:
I'll look into merging the fix and release it as soon as I can (I need to get the darn e2es to pass :sweat:) :+1:
I wasn't able to reproduce this issue locally yet ā but I'm curious on this @ostenbom.
function useMyContext() {
+ console.log("React", typeof React, React?.version, typeof React?.useContext);
const context = React.useContext(MyContext);
If you add that log in ā when running it locally, do you see the same React version when it works and when it doesn't? Locally I'm seeing React object 18.3.0-canary-14898b6a9-20240318 function
, which makes me wonder if there's some mismatch happening (e.g. npm resolves to 18.3.1
but next is swapping it to 18.3.0-canary-<hash>-<date>
)
This issue you're seeing has a long history in the Next.js repo, most of the times when there are multiple versions of react (e.g. from dependencies incorrectly anchoring it or declaring it) https://github.com/vercel/next.js/discussions/43577.
I'm not seeing multiple versions of react on the reproducible example but Next.js does swap to a canary release when using app router so it's definitely an awkward territory to debug.
I'm surprised to see the global state pollution as the potential root cause, as I'd assume the same thing happens on AWS lambdas outside of cold starts when deploying to Vercel. Is there anything else that could make the deployment in Cloudflare pin to the wrong React version (or wrong React bundle e.g. the RSC one instead of the others)?
I wonder if doing something similar to server-only / client-only
(https://github.com/vercel/next.js/blob/canary/packages/next/src/compiled/client-only/package.json#L13-L19) but with console.log
could also shed any extra insight on what might be happening when it fails.
Re-opening since there seems to be more going on here than potentially just the cloudflare-workers runtime issue!
+1 We also got this issue. Hopefully we can get your release soon @dario-piotrowicz
@thinhle-agilityio sorry for the delay, yeah we should be good to go, I'll make sure to get the PR reviewed, merged and a release done by early next week šš
@ostenbom PR merged š
Sorry it took so long, I've been hitting different issues with the solution and whatnot š
You can find the fix in the latest next-on-pages beta
release, I will make a stable release early next week if that's ok š
(this time early next week for real! š
)
(If you can double check and let me know if it's all working for you in the beta release that's be much appreciated š )
next-on-pages environment related information
Description
When accessing any next pages that use contexts in parallel we get 500
Internal Server Error
with:This has been a very tricky bug to hunt down, but we experience this when running wrangler locally, in our end-to-end test suites, and we have metrics that show us that this affects our users in production.
The pages must be loaded in quick succession. There must be several pages that are visited, not the same page.
To reproduce this error, in the reproduction repo run:
npm run preview
in one tab.npx playwright test --repeat-each=5
TypeError: Cannot read properties of null (reading 'useContext')
in the logs + 500 internal server errorReproduction
https://github.com/ostenbom/reproduce-next-parallel-context
Pages Deployment Method
Direct Upload (
wrangler pages publish
or the @cloudflare/pages-action GitHub Action)Pages Deployment ID
No response
Additional Information
No response
Would you like to help?