Closed mr-rpl closed 1 year ago
this looks to be identical to: https://github.com/aws-amplify/amplify-hosting/issues/2647
it was closed as part of launching support for Next 12/13 -- but the behavior still exists
bump
Related to high TTFB issue here: https://github.com/aws-amplify/amplify-hosting/issues/3122 which is tagged as "investigating" (but no new update in nearly a month)
I was about to submit the bug as well but came across this one.
We are getting time_starttransfer ~14 / 13 seconds on latest Nextjs project (measured like this https://stackoverflow.com/questions/18215389/how-do-i-measure-request-and-response-times-at-once-using-curl) e.g.
time_namelookup: 0.152133
time_connect: 0.168539
time_appconnect: 0.210377
time_pretransfer: 0.210509
time_redirect: 0.000000
time_starttransfer: 14.235495
----------
time_total: 14.236349
14 seconds is way too much :(
Amplify app id = d3jjevefis395x Region = us-east-1
I also have a Next13 app and enabling "performance mode" seemed to do nothing for me as TTFB was still quite bad (~5-10s). Upon inspection of the headers, I noticed the cache would always report Miss from CloudFront
and I could not see s-maxage
set.
Not sure if this will work for you but setting the header explicitly in next.config.js
seemed to have worked for me and now my site loads much much faster. Would be great if someone from Amplify team can confirm this is a valid workaround.
const nextConfig = {
headers: async () => {
return [
{
source: '/(.*)',
headers: [
{
key: 'Cache-Control',
value: 'public, s-maxage=86400'
}
]
}
];
}
};
Hi @mr-rpl @stefanzier @yuyokk 👋🏽 apologies for the delay here. Are you leveraging BugSnag in your package.json
? We have seen instances where packages like BugSnag have contributed ~12 seconds to the app initialization time which accounts for the high TTFB.
We are continuing to investigate which other dependencies can also have this affect on TTFB and will update this issue accordingly.
@hloriana we dont use BugSnag for our project.
@hloriana Same issue on our project. We do not use BugSnag. Here are the dependencies we use (I've omitted a few internal config packages):
@hloriana same, no bugsnag. I have a very simple nextjs application. Unfortunately, I was tired of +10min build times and these cache issues, so I just moved to Vercel but kept Amplify as my backend. Now things are great but I hope to use Amplify hosting in the future if these issues are addressed 🙏
Hi @mr-rpl @stefanzier @yuyokk 👋🏽 apologies for the delay here. Are you leveraging BugSnag in your
package.json
? We have seen instances where packages like BugSnag have contributed ~12 seconds to the app initialization time which accounts for the high TTFB.We are continuing to investigate which other dependencies can also have this affect on TTFB and will update this issue accordingly.
@hloriana we are not, our dep tree is:
"dependencies": {
"@apollo/client": "^3.7.0",
"@datadog/browser-rum": "^4.21.2",
"@emotion/react": "^11.10.0",
"@emotion/styled": "^11.10.0",
"@mui/icons-material": "^5.10.9",
"@mui/lab": "^5.0.0-alpha.95",
"@mui/material": "^5.10.1",
"@optimizely/react-sdk": "^2.9.1",
"formik": "^2.2.9",
"graphql": "^16.6.0",
"next": "^13.0.7",
"react": "^18.2.0",
"react-dom": "^18.2.0",
"use-is-in-viewport": "^1.0.9",
"uuid": "^9.0.0"
},
fwiw, we are now running on Vercel and have no issues
@hloriana not sure if it helps but we use mui
in our deps as well (common package with @mr-rpl and @jdpst)
@yuyokk @mr-rpl @hloriana I have set up a CloudWatch synthetic canary to ping the relevant URL every few mins, which seems to be good enough since most requests take <10ms so it's rare that we need >1 hot lambda. Far from perfect, but sufficient as a workaround until this issue can be resolved.
I wanted to pull the MUI thread -- I deployed a fresh shell next13 app to amplify and still experienced the 10-12second load. Zero additional dependencies.
@jdpst - i seem to get the lag even without waiting from time to time -- and especially after a deploy (first load) -- so for that, i personally wouldn't put anything into production
another finding whilst triaging: I found that this only happens when using the pages
directory. if using the new nextjs13 app
dir, the 10-12second hang time goes away.
of course, it brought up a new bug: revalidate seems to happen at exactly 3 minutes no matter what I set the param to 😂
We setup Cloudwatch Heartbeat Monitor to ping page every minute. We dont see 14s for time to first byte anymore, but still see ~3/4s occasionally.
another finding whilst triaging: I found that this only happens when using the
pages
directory. if using the new nextjs13app
dir, the 10-12second hang time goes away.of course, it brought up a new bug: revalidate seems to happen at exactly 3 minutes no matter what I set the param to 😂
I switched from 'pages' to 'app' directory and there is still 3 seconds TTFB which is unacceptable for a production application. The same app deployed to vercel is getting under 100 ms TTFB ...
I get between 6 - 14 seconds TTFB. I use MUI. Is there any update on this? I've switched to netlify but would love to use amplify instead.
People, if you're paying for Amplify hosting, why not enable Route 53 health check? That $1 solves the problem.
People, if you're paying for Amplify hosting, why not enable Route 53 health check? That $1 solves the problem.
Did you actually try? I’ve made a lambda that calls my website every 9 minutes and it didn’t seem to solve anything
Did you actually try? I’ve made a lambda that calls my website every 9 minutes and it didn’t seem to solve anything
Sure, working since I've subscribed here, month+ on one site and another does not even require it. Both have service workers though, checking on clean uncached browser.
Did you actually try? I’ve made a lambda that calls my website every 9 minutes and it didn’t seem to solve anything
Sure, working since I've subscribed here, month+ on one site and another does not even require it. Both have service workers though, checking on clean uncached browser.
Hmm, thank you for your suggestion, I will try
People, if you're paying for Amplify hosting, why not enable Route 53 health check? That $1 solves the problem.
I am not exactly sure what do you mean but enabling health checks with frequency of 30 seconds does not fix the issue for me. There is still 4-5 seconds of TTFB..
I am not exactly sure what do you mean but enabling health checks with frequency of 30 seconds does not fix the issue for me. There is still 4-5 seconds of TTFB..
How the same site works outside of the Amplify?
I am not exactly sure what do you mean but enabling health checks with frequency of 30 seconds does not fix the issue for me. There is still 4-5 seconds of TTFB..
How the same site works outside of the Amplify?
Works perfectly fine in vercel without any changes to the site.
I found a case that I can solve. I used a next.js and amplify hosting. Redirect to another page when accessing the index of my service.
https://foo.com => https://foo.com/abcd
At this time, TTFB takes more than 3 seconds. But it is fast when approaching directly.
https://foo.com/abcd
Before that, do the following settings.
1) Enable amplify's performance mode. 2) Setting amplify's custom header.
` customHeaders:
Maybe there's a problem between Amplify and the redirect of next.js?
I've experienced problems with some of the redirects when using Next 13. Downgrading to Next 12 solved the problem for me. And that was not just 3-13 seconds, but full browser freeze. Wasn't able to reproduce on a smaller app.
Getting extremely high TTFB ~15 second response time for SSR routes in our Next.js 13 amplify hosting compute app
This thread details my exact problems. Trying to launch a site with nextJS + Amplify but this cold start issue >10 seconds is a show stopper.
cache:
paths:
- 'node_modules/**/*' # Cache `node_modules` for faster `yarn` or `npm i`
- '.next/cache/**/*'
But this doesn't seem to solve this issue. Yes After it is loaded for the first time, the next time I instantly get it.
15 second cold start for a small web app is unacceptable.
my dependencies:
{
"dependencies": {
"@emotion/react": "^11.4.1",
"@emotion/styled": "^11.3.0",
"@mui/icons-material": "^5.0.3",
"@mui/material": "^5.0.3",
"@mui/system": "^5.11.1",
"@mui/x-date-pickers": "^5.0.0-alpha.7",
"@sentry/nextjs": "^7.25.0",
"@stripe/react-stripe-js": "^1.4.1",
"@stripe/stripe-js": "^1.15.1",
"@twilio/conversations": "^2.0.0",
"@twilio/video-processors": "^1.0.2",
"@typeform/embed": "^1.6.1",
"axios": "^0.21.1",
"chart.js": "^3.7.0",
"core-js": "^3.9.1",
"date-fns": "^2.29.3",
"emoji-mart-next": "^2.11.2",
"form-data": "^4.0.0",
"jest-canvas-mock": "^2.4.0",
"jsonwebtoken": "^9.0.0",
"lottie-web": "^5.9.2",
"material-ui-phone-number": "^3.0.0",
"md5": "^2.3.0",
"moment": "^2.29.1",
"moment-timezone": "^0.5.34",
"next": "12.3.4",
"next-redux-wrapper": "^6.0.2",
"nylas": "^6.4.2",
"query-string": "^7.0.0",
"rc-time-picker": "^3.7.3",
"react": "^18.2.0",
"react-calendar": "^3.3.1",
"react-chartjs-2": "^4.0.1",
"react-countup": "^6.0.0",
"react-csv-reader": "^3.5.0",
"react-data-table-component": "^7.0.0-alpha-5",
"react-dom": "^18.2.0",
"react-draggable": "^4.4.4",
"react-google-login": "^5.2.2",
"react-image-crop": "^10.0.4",
"react-infinite-scroll-component": "^6.0.0",
"react-linkedin-login-oauth2": "^1.0.9",
"react-material-ui-form-validator": "^3.0.1",
"react-modal": "^3.13.1",
"react-phone-number-input": "^3.1.47",
"react-quill": "^1.3.5",
"react-redux": "^7.2.3",
"react-responsive-carousel": "^3.2.16",
"react-slick": "^0.28.1",
"react-thunk": "^1.0.0",
"react-time-picker": "^4.2.1",
"react-visibility-sensor": "^5.1.1",
"redux": "^4.0.5",
"redux-thunk": "^2.3.0",
"request": "^2.88.2",
"slick-carousel": "^1.8.1",
"stripe": "^8.157.0",
"styled-components": "^5.2.1",
"tslib": "2.5.0",
"twilio": "^3.71.1",
"twilio-video": "^2.18.1",
"universal-emoji-parser": "^0.5.28",
"winston-papertrail-transport": "^1.0.9",
"wow.js": "^1.2.2"
},
"devDependencies": {
"@testing-library/jest-dom": "^5.14.1",
"@testing-library/react": "11.2.5",
"@types/cron": "^2.0.0",
"@types/express": "4.17.11",
"@types/jest": "^27.0.1",
"@types/node": "16.11.7",
"@types/react": "17.0.3",
"@types/react-dom": "17.0.3",
"@types/react-material-ui-form-validator": "^2.1.1",
"@types/react-modal": "^3.12.0",
"@types/react-slick": "^0.23.4",
"@types/supertest": "^2.0.12",
"@typescript-eslint/eslint-plugin": "^5.46.1",
"@typescript-eslint/parser": "^5.46.1",
"babel-jest": "26.6.3",
"cypress": "^6.8.0",
"eslint": "^8.2.0",
"eslint-config-airbnb": "19.0.4",
"eslint-config-airbnb-typescript": "^17.0.0",
"eslint-config-prettier": "^8.5.0",
"eslint-import-resolver-typescript": "^3.5.2",
"eslint-plugin-import": "^2.26.0",
"eslint-plugin-jest": "^27.1.7",
"eslint-plugin-jsx-a11y": "^6.5.1",
"eslint-plugin-prettier": "^4.2.1",
"eslint-plugin-react": "^7.28.0",
"eslint-plugin-react-hooks": "^4.3.0",
"husky": "^8.0.2",
"jest": "26.6.3",
"jest-canvas-mock": "^2.4.0",
"lint-staged": "^13.0.2",
"node-sass": "^8.0.0",
"prettier": "^2.8.1",
"supertest": "^6.2.3",
"ts-jest": "26.5.4",
"ts-node": "~9.1.1",
"tslint": "~6.1.3",
"typescript": "^4.3.2"
},
"husky": {
"hooks": {
"pre-commit": "lint-staged && echo '!! Husky is DONE Reviewing !!'"
}
},
"lint-staged": {
"*.{scss,css,md}": "prettier --write",
"*.{ts,tsx}": [
"yarn format",
"yarn lint"
]
}
}
The problem is that the resources allocated for SSR are insufficient. Checking the compute logs shows a 1024 MB lambda function. We should be free to increase it or have a bigger one by default. The only time the page is fast is after being cached in Cloudfront at the edge. But we cannot rely on caching to have good speed, there may be pages where you always want them to be fresh (account page, checkout page, payment, etc.).
as mentioned by @yuyokk my time_starttransfer is under 1 sec, which is fine. but cold start sucks big!
@mstoyanovv, when you use next 13 app folder, where do you specify Amplify.configure({ ...config, ssr: true })? I put it in layout.jsx but my app seems to have a hard time to find it. It from time to time complains about no credentials. I also tried Amplify.configure({ ...config}) without much success. No documentation on about this. Any insight would be greatly appreciated.
@rapgodnpm I observed in the build settings that you can enable performance mode
for specific branch..
It gives a Docker image on a host with 4 vCPU, 7GB memory.
@rapgodnpm I observed in the build settings that you can
enable performance mode
for specific branch.. It gives a Docker image on a host with 4 vCPU, 7GB memory.
Already tried that, had the same issue. From what I’ve read in the docs the performance mode just increases the max cache time from a few minutes to a day. Still a 1024mb lambda function was used. I think that docker image that you mention is for the build env. I do remember when the build was in progress that it mentions that kind of configurarion
Is there any progress on this issue? Cold starts are a serious performance issue for Next.js SSR users on Amplify Hosting. Is there any confirmation that using the Nextjs 13 app folder improves performance versus the older pages directory?
@IvanCaceres - app dir definitely helps with cold start -- but introduces a whole new set of issues :)
@rapgodnpm what memory would you suggest?
@mr-rpl We ended up using open-next to package the NextJS build output and deploying it on Lambda ourselves. Cold starts remain–so it's probably not an Amplify issue specifically–but with more control over the infra we can do things like enable provisioned concurrency. It still works out substantially less expensive than Fargate, which is where we're coming from.
@rapgodnpm what memory would you suggest?
@mr-rpl We ended up using open-next to package the NextJS build output and deploying it on Lambda ourselves. Cold starts remain–so it's probably not an Amplify issue specifically–but with more control over the infra we can do things like enable provisioned concurrency. It still works out substantially less expensive than Fargate, which is where we're coming from.
I don’t know. I would need to test it and see which produces a smaller cold start but I can’t really do that since amplify has no setting for this
Experiencing this for three amplify hosting apps I've migrated from WEB_DYNAMIC to WEB_COMPUTE two days ago. TTFB is spiking all over the place since then. The apps still on WEB_DYNAMIC are having no problems with server response.
Same issue for us. Will switch to Vercel because this is unacceptable.
@hloriii Any update on this? This has now been an issue for a couple of months.
same thing here, on dev environment we dont have problem we can wait, but our users on prod can't.... waiting for this issue to be resolve pls update us @hloriii
Can we get confirmation that this is being worked / a status update? Will need to investigate other hosting providers if there is no solve around the corner.
A little update about this issue, i test enabling Route 53 health check (as posted by @talaikis ) and works!. I will wait on an update on this issue to remove it but in the mean time i recommend it
I enabled the health check but still got the cold start
I can confirm cold starts and very slow loading TTFB still exists even with a health check running every minute. It takes an enormously long time to load an SSR route, I would advise against expecting a high quality production experience for your Next.js app on Amplify Hosting until Amplify / AWS can give us a solution to SSR cold starts. This issue has persisted for months since the release of Amplify Hosting Compute. I will also add that the Amplify Hosting compute logs are lacking and opaque, I have experienced scenarios where they don't surface errors that occur during the server side rendering phase in a Next.js /pages route. These logs don't surface much of anything at all.
My experience has been:
Note: My Amplify's "Domain management" config is set to direct from https://myapp.co ==> https://www.myapp.co ... I suspect the health check request non-www URL wasn't actually spinning up the Lambdas that serve the app (or something).
Hey everyone 👋 , thank you for your continued patience.
We are actively investigating and are working on narrowing down the root cause of the elevated latencies (high TTFB) with Compute apps. Please rest assured that this is our highest priority and we will keep you posted with any updates.
Apologies for the inconvenience caused due to this behavior with Compute apps.
Waiting for this as well... my web feels like it's loading the whole internet at start up.
Need an update on this as well. Otherwise, we’ll have to switch to Vercel again. Such a high TTFB is unacceptable.
Before opening, please confirm:
App Id
d15y9mlar87m44
AWS Region
us-east-1
Amplify Hosting feature
Not Applicable
Describe the bug
Upon new deployment and/or after the app goes idle, we are experiencing a 10-12 second delay on any page of our Next13 application. We are on the new Web Compute platform.
Mere speculation wants to blame Lambda Cold Start -- but I am not 100% sure if that is the case -- either way, a public facing UI should never be faced with a 10-12 second cold start.
Expected behavior
Loads in reasonable time
Reproduction steps
Build Settings
No response
Log output
No response
Additional information
We have a sample build up here
Screenshot shows a 10 second request