Closed gracektay closed 1 year ago
Hello, @gracektay and thank you for opening this issue. This appears to be related to the clock drift error reported in #11675. Can you share the error you're experiencing or any details on the 4XX error response you're getting? And is there a specific API that's being called (either on the GraphQL or Auth side) that can be pinned as the offender?
@cwomack Hi! here are lots of pictures showing the issue from the client side. Hope this helps! Client side:
{
"errors" : [ {
"errorType" : "BadRequestException",
"message" : "Signature not yet current: 20231027T153914Z is still later than 20231027T153419Z (20231027T152919Z + 5 min.)"
} ]
}
Here are some pictures:
our call:
import { API } from "aws-amplify";
API.graphql({ authMode, query, variables });
Also, if its helpful to know, we had originally been unsure of the cause and my team had a case opened, CASE 14087351881, forwarded to the AppSync team.
@gracektay, thank you for the additional information and pics! We are currently investigating this and will update this issue as soon as we can.
There are several previously closed issues related to this. In particular, this issue was resolved by both a library fix, and a fix to the the CLI. @gracektay - can you also share what version of the CLI you are using? Thank you!
@gracektay, have you also enabled CORS and checked the IAM permissions for the public identity pool? We're having trouble reproducing the issue with GraphQL (but have reproduced it using Pinpoint similar to #9281), and trying to see if we can get the same recurring error you see.
If you have a minimal sample app (or even what the frontend code looks like), that would be helpful too!
đŸ‘‹ @gracektay, by any chance are you using Analytics
and GraphQL
together ? in client side app, Or any api that is using pre-signer
?
I can confirm, I ran into same issue when hitting getUrl
in storage after modifying the clock which internally uses pre-signer
to get the url
of the object.
I've set up this app: https://d3rue5wy9qeptt.cloudfront.net using the @aws-amplify/cli at 12.7.1 and aws-amplify at 5.3.11. Setting the time ten minutes forward and back triggers it still. If it fails once, it can still fail in the future. If you cannot get it to trigger I can attach a video, but I confirmed just now that even with both those versions it still does it.
@cwomack Is that enough for you to inspect further?
This only impacts a couple of users (its a handful of people responsible for 50k requests) and for the most part everything is working as expected (except for when we switch back and forth on time) so my gut feeling is that it is not related to CORS or our permissions. If it was then I'd assume it'd be broken for the hundred thousand other users.
Here are some snippets from our IAM permissions in case its useful (with certain values removed just in case)
Trust relationship:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "cognito-identity.amazonaws.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"cognito-identity.amazonaws.com:aud": "us-east-2:[[identifier-removed]]"
},
"ForAnyValue:StringLike": {
"cognito-identity.amazonaws.com:amr": "unauthenticated"
}
}
}
]
}
permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"mobileanalytics:PutEvents",
"cognito-sync:*",
"cognito-identity:*"
],
"Resource": "*",
"Effect": "Allow"
},
{
"Action": [
"appsync:GraphQL"
],
"Resource": [
"arn:aws:appsync:us-east-2:[[accountid]]:apis/[[apiid]]/types/*/fields/*",
"arn:aws:appsync:us-east-2:[[accountid]]:apis/[[apiid]]"
],
"Effect": "Allow"
}
]
}
@kvramyasri7 Catching up there were lots of comments! Within the app, we leverage the Hub, Auth, API, and Cache. Hub is to handle login, and the cache is for a different page in the app. We are not using Analytics. I think this specific part of the app would likely just touch the API and Auth imports. From the logs it seems like when Amplify is attempting to configure the GraphqQL.
Thanks for sharing more details and the hosted app @gracektay
Looks like irrespective of my local time (behind/ forward / current) the app seems to always make a network call every 1min. If my time is incorrect (behind/ forward) it fails with the BadRequestException
error mentioned above
Investigating the same.
Hi @ashwinkumar6 I should have clarified this, we have polling for emergency alerts. We did remove that in our production environment due to the 4xx error issues. The polling is part of redux tool kit and works well for 99 percent of users. It's to check to see if there is an emergency on campus, so we want to check it regularly. (Plan to add it to an EventBridge approach later, but polling was a nice feature we wanted to use. Can't though due to this issue). The polling itself is to be expected, it's the actual failures that are the issue. It does make it easier to see the issue though so we turned it back on in dev type environments, although to be clear these issues happen with or without the polling on.
Ahh got it, that makes sense, thanks for clarifying that. So the issue isn't with the repeated calls (that's done by polling as mentioned above) but with handling a clock drift failure event. Let me discuss with the team and get back
@ashwinkumar6 to an extent that’s true, the minute repeated successful calls are fine. But the failures with clock drift also cause repeated calls, an infinite number of them until the browser is force closed.
The polling just offers more opportunities for the infinite failure to occur.
Thank you @gracektay If possible, could you please also share the Polling code snippet in the dev env that makes the GraphQl API call every 1min (specifically the graphQl API call)
import { useGetEmergencyAlertsQuery } from "../../services/graphql/api";
import { getEmergencyAlerts } from "../../services/data-sources/enums";
import { Mui, MuiIcons } from "@osu/react-ui";
import { Alerts } from "./Alerts";
import { Links } from "./Links";
import { useStyles } from "../hooks/useStyles";
export const ErrorAlert = () => {
const { data } = useGetEmergencyAlertsQuery(null, {
pollingInterval: 60000
});
const emergencyAlerts = data?.[getEmergencyAlerts]?.list ?? [];
const { background } = useStyles()
if (!emergencyAlerts?.length) {
return false;
}
const links = emergencyAlerts.map(({ links = [] }) => links).flat(1);
return (
<Mui.Alert
severity="error"
role="none"
iconMapping={{
error: <MuiIcons.Notifications fontSize="inherit" />,
}}
sx={background}
>
<Mui.Typography variant="srOnly" component="h2">
Buckeye Alert Message
</Mui.Typography>
<Alerts alerts={emergencyAlerts} />
<Links links={links} />
</Mui.Alert>
);
};
It’d be a bit difficult to share the entire code set up but that’s the component and this is the base API call:
import { API, Auth } from "aws-amplify";
import { getSessionToken } from "../../util/functions";
import { AUTH_MODE } from "../../api-config";
import { isAfter } from 'date-fns'
export async function graphqlBaseQuery(args) {
const { query, variables } = args;
let authMode
const sessionToken = await getSessionToken().catch()
const { identityId, expiration } = await Auth.currentCredentials().catch() ?? {}
const expirationDate = new Date(expiration)
const nowDate = new Date()
const isExpired = isAfter(nowDate, expirationDate)
if(sessionToken) {
authMode = AUTH_MODE.AMAZON_COGNITO_USER_POOLS
} else if(identityId && !isExpired) {
authMode = AUTH_MODE.AWS_IAM
}
let response = {}
try {
if(!authMode) {
return {
error: {
status: 500,
data: 'Unable to authenticate credentials'
}
}
}
const result = await API.graphql({ authMode, query, variables });
if (result?.data) {
response = result;
} else {
throw new Error("Unexpected data format");
}
} catch (error) {
response = { error: { status: 500, data: (error?.errors ?? error) } };
if(error?.data) {
response.data = error.data
}
return {
error
}
}
return response;
}
the expired bit was an attempt to fix the issue, but it made no difference.
Reposting these docs in case it’s helpful to understand the polling: https://redux-toolkit.js.org/rtk-query/usage/polling
Hi @gracektay
Thank you for sharing the detailed information. They really helped the investigation! I just checked the site you provided earlier: https://d3rue5wy9qeptt.cloudfront.net, and I can reproduce the issue with following steps:
BadRequest
error and the immediate retry succeeded with correct Date
header. (The clockskew correction logic works as expected here)BadRequest
error.I'll keep you post as the investigation continues.
Hi @gracektay
I just confirmed this is indeed a bug with our clock drift correction function. Here's where it goes wrong:
Date
header at 19:00:00 (client clock + offset)Date
header at 20:00:00 with the offset.BadRequest
error and trigger the drift correction. The function compares the request date at 20:00:00 and the server clock 19:00:00, and set the offset to -1 hour, then initiate a retry request.Date
header is 18:00:00 and hence failed with BadRequest
. At this time the drift correction detect and set the offset to 1 hour again and trigger the retry. Then trigger the error in Step 3.Date
is flipped back and forth between 18:00:00 and 20:00:00 whereas the server clock is 19:00:00.We will implement a fix for this issue and let you know when it's ready.
Thanks so much @AllanZhengYP! I am happy to upgrade this cloud front url with any package versions you'd like to test, please let me know if there is anything on my side that can be helpful.
@gracektay Hi, a fix has been released in aws-amplify@5.3.12. Can you give a try with this version?
@AllanZhengYP Hi! I've updated https://d3rue5wy9qeptt.cloudfront.net. I have tested in Chrome and Safari, so far so good. I see it error once, retry and succeed. I'll let you know if I see more issues! Thanks SO much!
@gracektay, glad to hear this is resolved with the fix that @AllanZhengYP introduced! We'll close this issue as resolved for now, but feel free to reply back if there's further problems tied to this.
Before opening, please confirm:
JavaScript Framework
React
Amplify APIs
Authentication, GraphQL API
Amplify Categories
auth, api
Environment information
Describe the bug
We have an Amplify project that allows for auth via user pool and through public identity pool. We are experiencing a high error rate of 4xx errors. We are constantly in an error state. While I have seen a number of tickets related to clock drift, I have not seen one call out an infinite number of calls to the backend. While debugging on the very latest, I set a debug flag on the supposed fixes for this issue and it just re runs that code over and over without stopping. If you do not close the browser, it never stops firing leading to 50,000 client calls and up. We've had to add a rate limiter just to try and help manage this flood of errors.
We were on the very latest of Amplify but downgraded to a version suggested in another issue that slightly helped (5.3.3), but not entirely. The image below shows some of what we have been dealing with, the spike was when we were on the newest Amplify version.
Expected behavior
A client should not be able to fire off tens to hundreds of thousands of times. There should be a controllable number of retries. Once that specific 4xx error is identified, a refreshed, valid signature is created.
Reproduction steps
On an amplify project with graphql and public identity pool, set your clock ahead or behind ten minutes.
Code Snippet
Log output
aws-exports.js
No response
Manual configuration
No response
Additional configuration
No response
Mobile Device
No response
Mobile Operating System
No response
Mobile Browser
No response
Mobile Browser Version
No response
Additional information and screenshots
One week of errors:
Zoom in on latest Amplify:
You can see a near mirror image between errors and our calls. We also never should have had that many calls, its just the client being able to fire off an uncontrollable amount of 4xx errors.