application-research / delta-ui

7 stars 0 forks source link

[Bug] UI will not load sometimes on FDI #87

Closed jcace closed 1 year ago

jcace commented 1 year ago

GET https://delta-web.estuary.tech/_next/static/chunks/main-app-3357acec18824b3c.js net::ERR_ABORTED 404

This error will show up very frequently when trying to load https://delta-web.estuary.tech/ddm/datasets . Unsure why that is, as the app has been built and loads fine sometimes.

I've verified that this main-app-3357... file does not exist in the /usr/local/src/delta-web/.next/server/chunks directory on the delta vms:

Image

I'm not sure why it's requesting that file. Something because of the nextjs build

This issue seemed to come up after we added the page routing, perhaps it's related to: https://stackoverflow.com/questions/66084031/next-js-error-getting-404-when-fetching-js-resources-after-refresh

Similar errors are seen when trying to replicate this error locally, perhaps we should start there

npm install
npm run build
npm run start

Image

jcace commented 1 year ago

It seems like the nextjs artifact filenames are different on the various delta nodes. This would make sense as to why it can't be loaded sometimes if the request gets load balanced to a server that doesn't have it. Unsure why this is happening or if there's a way to make them all consistent at build time.

$ for i in 01 02 03 04 05 06 07 08; do ssh "prod-ehi-delta${i}.estuary.tech" -t "ls /usr/local/src/delta-web/.next/static/chunks/main-app-* && exit"; done
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta01.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta02.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta03.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-b2917daca6be0c4c.js
Connection to prod-ehi-delta04.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-3357acec18824b3c.js
Connection to prod-ehi-delta05.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta06.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-e89952f8c303cdaa.js
Connection to prod-ehi-delta07.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta08.estuary.tech closed.
jcace commented 1 year ago

Ran a re-build on node 01 and 04, Node 01 got the file named main-app-ddea8e57fffb4d96.js Node 04 got the file named main-app-3357acec18824b3c.js

Both repos are at commit e2b4a6ff2f08a98882b60efac7dd0bbb04c46b1f

So it seems like this issue stems from the fact that nextjs build artifact filenames are not deterministic. As a naive fix, I think we should modify the playbook to build it in one place, and then simply copy the assets over to each server, overwriting whatever was previously there (https://github.com/application-research/delta-dm-playbook/blob/main/roles/arg-delta-web/tasks/main.yml) Any thoughts on this @PC-Admin @Zorlin @elijaharita @LucRoy

PC-Admin commented 1 year ago

ChatGPT reckons the file names should come out identical if all the dependencies and environmental variables are the same.

Nodes 01 02 03 06 08 came out the same, so this implies there's something different on the other boxes.

You can investigate this by checking for differences in the environments between the two nodes. Here are a few things to consider:

- Dependency versions: Make sure that the versions of Next.js and other dependencies are exactly the same on both nodes. You might want to use a yarn.lock or package-lock.json file to lock the versions.

- Environment variables: If you use environment variables in your Next.js application that affect the build, make sure they are the same on both nodes.

- Build plugins: If you are using any build plugins, make sure they are identical and configured the same way on both nodes.

Although building in one place and just mirroring it to every node would likely be more reliable.

PC-Admin commented 1 year ago

Okie dokey the fix is here: https://github.com/application-research/delta-dm-playbook/compare/main...compile-once

It's been applied to prod and seems to have worked:

delta-dm-playbook$ for i in 01 02 03 04 05 06 07 08; do ssh "prod-ehi-delta${i}.estuary.tech" -t "ls /usr/local/src/delta-web/.next/static/chunks/main-app-* && exit"; done
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta01.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta02.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta03.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta04.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta05.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta06.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta07.estuary.tech closed.
/usr/local/src/delta-web/.next/static/chunks/main-app-ab039ab8917f9d77.js
Connection to prod-ehi-delta08.estuary.tech closed.

A couple of things that aren't super optimal about it:

PC-Admin commented 1 year ago

Although we got all the delta-web filenames lined up we're still running into this bug: Screenshot from 2023-07-18 06-28-34

PC-Admin commented 1 year ago

Turns out this was only because of caching on the server side. After resetting the delta-web.service on every box this bug has completely vanished. :)

Will merge this fix today and we can move on. :+1: