blitz-js / next-superjson-plugin

SuperJSON Plugin for Next.js Pages and Components
198 stars 13 forks source link

Build fails on Vercel sporadically #25

Closed IGassmann closed 1 year ago

IGassmann commented 1 year ago

Describe the bug

Next.js build fails on Vercel sporadically with many errors looking as such:

thread '<unnamed>' panicked at 'failed to invoke plugin: failed to invoke plugin on 'Some("/vercel/path0/pages/auth/login.tsx")'

Caused by:
  0: Failed to create plugin instance
  1: missing requires CPU features: "EnumSet(AVX512DQ | AVX512VL | AVX512F)"', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/swc-0.212.1/src/plugin.rs:222:14

Expected behavior

The build should always run successfully.

Reproduction link

No response

Version

0.4.0

Config

{
  experimental: {
    esmExternals: "loose",
    swcPlugins: [
      [
        "next-superjson-plugin",
        {
          excluded: [],
        },
      ],
    ],
  },
  webpack: (config, context) => {
    config.resolve.fallback = { fs: false }

    return config
  },
}

Additional context

Next.js version: 12.2.5

I believe this error happens when the host machine that is building the project on Vercel has a CPU that doesn’t have either of those features: AVX-512DQ, AVX-512VL, or AVX-512F. Since Vercel might use different host machines for each build, this would explain why it would fail sporadically.

orionmiz commented 1 year ago

Could you please try to build with a new release? (v0.4.1)

IGassmann commented 1 year ago

@orionmiz, unfortunately, the issue is still persisting with the new release.

orionmiz commented 1 year ago

Might be fixed with #29 + Next.js canary

But doesn't ensure since I can't reproduce this bug in local machine.

orionmiz commented 1 year ago

@IGassmann Plugin v0.4.2 is just released.

Could you give it another try with Next.js v12.3.1?

It will be a challenging issue if it is not fixed by the update.

IGassmann commented 1 year ago

It's fixed with v0.4.2 and Next.js v12.3.1.

martinbianchi commented 1 year ago

This just happened to me with v0.4.2 and Next.js v12.3.1

IGassmann commented 1 year ago

@martinbianchi @orionmiz, I confirm that this is actually still happening.

sluukkonen commented 1 year ago

I'm also seeing this on a local CI machine that has an Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz CPU.

Using Next 12.3.1 and next-superjson-plugin 0.4.2.

IGassmann commented 1 year ago

@orionmiz I'm confident this happens due to the build cache of Vercel. Vercel maintains a build cache between builds but doesn't always use the same CPU between every build. So next-superjson-plugin can be once compiled for a specific CPU, and then the same compiled output would be used by another CPU on a subsequent build due to the build cache. If I rebuild the deployment without the cache on Vercel, the build is always successful.

The solution here might be to replicate @next/swc build and publish process, which seems to have solved this problem. The package seems to pre-compile binaries with NAPI-RS. Check out their GitHub Workflow and the package configuration.

orionmiz commented 1 year ago

@IGassmann You really found a breakthrough. This issue is delivered to Vercel through SWC maintainer. They will look at it and make solutions. Thanks!


UPDATE: Heard from the SWC Team, that it might not be a problem from the build cache. So I've decided to find a solution from the plugin itself.

The problem is that some SIMD instructions (like AVX-512) cannot be run in the unsupported CPUs. But the compiler gets them into the plugin without considering the client which runs it.

Our goal is to find how to compile the plugin depending on the client's hardware (or just deopt all the kind of SIMD) Then I found a useful target feature introducing local non-deterministic SIMD supported by LLVM. Fortunately, there was a way to apply it to the plugin easily with RUSTFLAGS.

So the plugin v0.4.3 including that feature is just released now. Would you mind testing the build on Vercel with v0.4.3?

IGassmann commented 1 year ago

Thanks! We've merged the fix. I'll let you know if the issue still happens.

BTW, I would recommend adding a new comment to this issue for any new updates instead of editing your existing comment. Otherwise, people that are subscribed to this issue (including me) don't get notified about it.

orionmiz commented 1 year ago

Thanks! We've merged the fix. I'll let you know if the issue still happens.

BTW, I would recommend adding a new comment to this issue for any new updates instead of editing your existing comment. Otherwise, people that are subscribed to this issue (including me) don't get notified about it.

@IGassmann Oops, I've thought you would get notification from just mentioning PR.

Anyway, if this error is still existing after updating to v0.4.3, Providing minimal reproduction of the project would be a important clue to resolve this issue. Because I've tried to reproduce this error by deploying this project (It also checks CPU so that make me notice the changes of build environment) several times, but couldn't make it. Maybe the source code could be one of the causes.

IGassmann commented 1 year ago

@orionmiz it's difficult to reproduce because it only happens occasionally. Plus, we're on the enterprise plan of Vercel, which might use different machines to build the project than projects on the standard plan.

orionmiz commented 1 year ago

@IGassmann That makes sense. Hope the update to get rid of this issue.

IGassmann commented 1 year ago

The issue is still occurring with v0.4.3.

Let's recap what we know:

The CpuFeature error type is thrown when the module was compiled with a CPU feature that is not available on the current host that is running the module.

This makes me think that the GitHub Action environment has those CPU features available, which makes the compiled output expect them as well. The solution might be to either find a way to compile the plugin without those CPU features or to compile it in an environment that doesn't have those CPU features.

Here's a related issue: https://github.com/wasmerio/wasmer/issues/2707

sluukkonen commented 1 year ago

In our case, the bug manifests when trying to build our Next app on a custom Github Actions runners that have that older Xeon. We only see it occasionally, since most of our runners have newer processors that do support the relevant AVX 512 instructions. The job also shares node_modules between machines via the Github Actions cache.

IGassmann commented 1 year ago

@sluukkonen would be able to build on the older Xeon runner without the cache and report back if it also fails, please?

orionmiz commented 1 year ago

SWC update has arrived from our request:

Plugin has been updated already:

Wait for the coming Next.js canary release to use resolved plugin runner.

orionmiz commented 1 year ago

Next.js v12.3.2-canary.30 including SWC update is just released!

Please follow these steps and let's make sure the error has gone:

  1. Update Next.js to the canary
  2. Update Plugin to the v0.4.8
  3. Build without wasm cache