Google Sign In with `customState` works on local dev/hosted staging, but not on production

talyh commented 3 years ago

Before opening, please confirm:

[X] I have searched for duplicate or closed issues and discussions.
[X] I have read the guide for submitting bug reports.
[X] I have done my best to include a minimal, self-contained set of instructions for consistently reproducing the issue.

JavaScript Framework

React

Amplify APIs

Authentication

Amplify Categories

auth

Environment information

``` # Put output below this line System: OS: macOS 11.4 CPU: (12) x64 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Memory: 119.77 MB / 16.00 GB Shell: 5.8 - /bin/zsh Binaries: Node: 12.16.1 - /usr/local/bin/node Yarn: 1.22.4 - /usr/local/bin/yarn npm: 6.13.4 - /usr/local/bin/npm Browsers: Chrome: 91.0.4472.114 Firefox: 89.0.2 Safari: 14.1.1 npmPackages: @ampproject/toolbox-optimizer: undefined () @aws-amplify/auth: ^4.1.1 => 4.1.1 @aws-amplify/core: ^4.1.3 => 4.1.3 @babel/core: undefined () @googlemaps/js-api-loader: ^1.2.0 => 1.2.0 @goproperly/eslint-config-properly-react: ^2.5.0 => 2.5.0 @goproperly/linoleum: ^12.10.1 => 12.10.1 @goproperly/web-util: 7.0.0 => 7.0.0 @next/bundle-analyzer: ^10.0.3 => 10.0.3 @reach/auto-id: ^0.12.1 => 0.12.1 (0.11.2, 0.15.0) @reach/disclosure: ^0.12.1 => 0.12.1 @reach/menu-button: ^0.12.1 => 0.12.1 @reach/tabs: ^0.15.0 => 0.15.0 @seznam/compose-react-refs: ^1.0.4 => 1.0.4 @testing-library/jest-dom: ^5.11.2 => 5.11.2 @testing-library/react: ^10.4.7 => 10.4.7 @testing-library/react-hooks: ^5.0.3 => 5.0.3 algoliasearch: ^4.3.0 => 4.3.0 amphtml-validator: undefined () apexcharts: ^3.20.2 => 3.20.2 arg: undefined () async-retry: undefined () async-sema: undefined () babel-jest: ^26.2.2 => 26.2.2 (26.6.3) bfj: undefined () bootstrap: ^4.1.3 => 4.1.3 buttercms: ^1.2.7 => 1.2.7 cacache: undefined () cache-loader: undefined () ci-info: undefined () comment-json: undefined () compression: undefined () conf: undefined () content-type: undefined () cookie: undefined () css-loader: undefined () debug: undefined () deep-object-diff: ^1.1.0 => 1.1.0 devalue: undefined () dot-prop-immutable: ^2.1.0 => 2.1.0 escape-string-regexp: undefined () eslint: ^7.6.0 => 7.6.0 fast-deep-equal: ^3.1.3 => 3.1.3 file-loader: undefined () find-cache-dir: undefined () find-up: undefined () fresh: undefined () google-map-react: ^2.1.7 => 2.1.7 gzip-size: undefined () h3-js: ^3.7.1 => 3.7.1 html-entities: ^1.3.1 => 1.3.1 http-proxy: undefined () husky: ^4.2.5 => 4.2.5 ignore-loader: undefined () imagemin-mozjpeg: ^9.0.0 => 9.0.0 imagemin-pngquant: ^9.0.0 => 9.0.0 imagemin-svgo: ^8.0.0 => 8.0.0 intl: ^1.2.5 => 1.2.5 is-animated: undefined () is-docker: undefined () is-wsl: undefined () jest: ^26.6.3 => 26.6.3 jest-styled-components: ^7.0.2 => 7.0.2 jquery: ^3.5.1 => 3.5.1 js-cookie: ^2.2.1 => 2.2.1 json5: undefined () jsonwebtoken: undefined () lint-staged: ^10.2.11 => 10.2.11 lity: ^2.4.1 => 2.4.1 loader-utils: undefined () lodash.curry: undefined () lru-cache: undefined () moment: ^2.29.1 => 2.29.1 nanoid: undefined () neo-async: undefined () next: ^10.2.0 => 10.2.0 next-build-id: ^3.0.0 => 3.0.0 next-compose-plugins: ^2.2.0 => 2.2.0 next-optimized-images: ^2.6.2 => 2.6.2 nextjs-sitemap-generator: ^1.3.1 => 1.3.1 node-fetch: ^2.6.1 => 2.6.1 ora: undefined () patch-package: ^6.4.7 => 6.4.7 popper.js: ^1.16.1 => 1.16.1 postcss-flexbugs-fixes: undefined () postcss-loader: undefined () postcss-preset-env: undefined () postcss-scss: undefined () prettier: ^2.2.1 => 2.2.1 promise-memoize: ^1.2.1 => 1.2.1 prop-types: ^15.7.2 => 15.7.2 react: ^17.0.1 => 17.0.1 react-apexcharts: ^1.3.7 => 1.3.7 react-calendly: ^2.0.4 => 2.0.4 react-compound-slider: ^2.5.0 => 2.5.0 react-dom: ^17.0.1 => 17.0.1 react-html-parser: ^2.0.2 => 2.0.2 react-html-parser-demo: 0.0.0 react-pose: ^4.0.10 => 4.0.10 react-redux: ^7.2.0 => 7.2.0 react-share: ^4.3.1 => 4.3.1 react-slick: ^0.26.1 => 0.26.1 react-stack-grid: ^0.7.1 => 0.7.1 react-sticky-box: ^0.9.3 => 0.9.3 react-test-renderer: ^16.13.1 => 16.13.1 react-timeago: ^4.4.0 => 4.4.0 recast: undefined () redux: ^4.1.0 => 4.1.0 redux-actions: ^2.6.5 => 2.6.5 redux-dynamic-modules: ^5.2.3 => 5.2.3 redux-dynamic-modules-saga: ^5.2.3 => 5.2.3 redux-saga: ^1.1.3 => 1.1.3 redux-saga-test-plan: ^4.0.1 => 4.0.1 redux-saga/effects: undefined () resolve-url-loader: undefined () rimraf: ^3.0.2 => 3.0.2 (2.6.3, 2.7.1) rollbar-sourcemap-webpack-plugin: ^3.2.0 => 3.2.0 sass: ^1.35.1 => 1.35.1 sass-loader: undefined () schema-utils: undefined () selfsigned: ^1.10.7 => 1.10.7 semver: undefined () send: undefined () slick-carousel: ^1.8.1 => 1.8.1 source-map: undefined () string-hash: undefined () strip-ansi: undefined () styled-components: ^5.1.1 => 5.1.1 styled-components/macro: undefined () styled-components/native: undefined () styled-components/primitives: undefined () supercluster: ^7.1.0 => 7.1.0 terser: undefined () text-table: undefined () thread-loader: undefined () twilio: ^3.65.0 => 3.65.0 unistore: undefined () use-deep-compare-effect: ^1.4.0 => 1.4.0 use-media: ^1.4.0 => 1.4.0 use-resize-observer: ^7.0.0 => 7.0.0 use-timeout: ^1.1.0 => 1.1.0 uuid: ^8.3.2 => 8.3.2 (3.4.0, 7.0.3) victory: ^33.1.7 => 33.1.7 web-vitals: undefined () webpack: undefined () webpack-sources: undefined () npmGlobalPackages: commitizen: 4.2.2 jest: 26.6.3 npm: 6.13.4 typescript: 4.0.5 ```

Describe the bug

TLDR: Google Sign In with customState works on local dev/hosted staging, but not on production.

CONTEXT

We run a NextJS react app, deployed within Vercel. This app has an Amplify config file that maps to either a Prod or Staging Cognito user pool, based on an env variable that determines whether it's running prod or staging. In both cases, that setup looks like this, with the `XXXX` just representing the appropriate values for each Cognito pool. ``` javascript Auth: { region: 'us-east-1', userPoolId: 'us-east-XXXX', userPoolWebClientId: 'XXXX', oauth: { domain: 'xxxx', scope: ['email', 'openid', 'profile'], redirectSignIn: 'XXXX/validation/', redirectSignOut: 'XXXX/validation/', clientId: 'XXXX', responseType: 'code', }, } ``` On their setup, these Cognito pools are the same (same policies, same attributes, etc), only with different domains under `Domain Names`. They also have equivalent apps setup, with OAuth configured the same on both sides. Both the Prod and Staging Cognito pools have `Identity Provider` configuration pointing to the same Google Cloud Project, which is configured to accept request from both sources. Both are associated to the same organization and region within AWS. --- On app initialization, we start listening for auth events. That looks like: ```javascript useEffect(() => { // We want to initialize amplify before setting auth state to avoid race condition between them initializeAmplify(); dispatch(setAmplifyInitialize(true)); Hub.listen('auth', ({ payload: { event, data } }) => { switch (event) { // // .... sign in & sign out events here // case 'customOAuthState': dispatch(loggedInRedirect({ customState: data })); break; default: break; } }); // eslint-disable-next-line react-hooks/exhaustive-deps }, []); In this context, `dispatch` is dispatching to our redux store, so the rest of the app can react to it. ``` --- Before triggering the sign in, we generate some relevant state, attached it to an encoded string, save it locally. We then send the encoded string as `customState`. That looks like ```javascript await Auth.federatedSignIn({ provider: 'Google', customState: encodedState, }); ``` The `/validation` page (set in the `redirectSignIn`) has logic to compare that locally saved state with what's in redux. If they match, we redirect the user to the page they were at when sign in was initialized.

HOW THE ISSUE MANIFESTS

The flow above works well on local dev and hosted staging environments. We see the `customState` event firing and the corresponding state being set in redux. In production, however, we don't see the event firing and no redux data. There are no console or network errors and the user token is present. With the exception of the token, everything else seems to work. We initially suspected a race condition happening in prod that didn't happen on staging, but after some experiments, discarded this theory. We then used [`patch-package`](https://www.npmjs.com/package/patch-package) to add some logging to `@amplify/auth`. With this logging, we were able to see ![image](https://user-images.githubusercontent.com/23743595/125881479-15cc8973-2999-4f81-adc2-91f4be1e9bd5.png) --> It's worth calling out that this silent errorring consumed many hours of debugging, so a minor feature request inside this larger bug report would be for the error to be more explicit to avoid wild goose chases. The error above wasn't sufficient to indicate to us why this would yield different behaviours across different environments. The querystring we receive in return has `code` and `state` in both cases. The value of `state` in the QS seems to be encoded by Amplify, so hard to say what it actually contains in each case, but it is present at all times.

Expected behavior

Given equivalent Cognito configurations, Amplify should behave the same
Consistent behaviour across environments in regards to triggering customOAuthState or failing it
Better communication of error in case of failure

Reproduction steps

The bug description has all the details I can provide, but I can't offer repro steps as I only have our own Cognito pools to test with.

Code Snippet

// Put your code below this line.

Log output

``` // Put your logs below this line ```

aws-exports.js

No response

Manual configuration

Auth: {
      region: 'us-east-1',
      userPoolId: 'us-east-XXXX',
      userPoolWebClientId: 'XXXX',
      oauth: {
        domain: 'xxxx',
        scope: ['email', 'openid', 'profile'],
        redirectSignIn: 'XXXX/validation/',
        redirectSignOut: 'XXXX/validation/',
        clientId: 'XXXX',
        responseType: 'code',
      },
    }

Additional configuration

No response

Mobile Device

No response

Mobile Operating System

No response

Mobile Browser

No response

Mobile Browser Version

No response

Additional information and screenshots

This issue was 100% consistent in its behaviour. Everyone in our team experienced the same result both in terms of what was working on staging and not working on production, discarding issues of local environments or uncommon conditions in the Google account.

We had a test version deployed to production which skipped the comparison and just executed the gated behaviour regardless. That version works, indicating the actual token validation / general Cognito/Amplify setup is ok. The only manifested issue is really that we don't get the customState event/data in production.

Digging into the actual data returned by the different Cognito pools / Amplify token exchange, the only significant difference we could see is that production returns a idToken.payload.nonce, which we don't see in staging. To the best of my understanding, this is not related to any configuration of ours, but rather something Cognito is deciding to do.

When we started dealing with this issue we were using

"@aws-amplify/core": "^3.8.2",
"@aws-amplify/auth": "^3.4.10",

We since updated to

"@aws-amplify/auth": "^4.1.1",
"@aws-amplify/core": "^4.1.3",

This didn't help matters

We initially suspected a race condition, so we experimented a lot with moving the Amplify initialization / event listener subscription. None of those experiments yielded a change in behaviour.

As with most modern apps, there's a lot that runs on our production environment. We've done lots of experiments disabling analytics and other similar things to try and remove variables, but no matter what, we were always seeing the same result.

talyh commented 3 years ago

I noticed there's been some movement in the issue in terms of assignments, but no additional info. Are you able to provide an update on this?

Thanks,

evcodes commented 3 years ago

Hello @talyh, I am working on reproducing this issue and will get back to you with an update before Friday.

Cheers,

Eddy

Khairo-kh commented 3 years ago

Hello @talyh, Thank you for the detailed explanation. I'm trying to confirm if this is related to another open issue here. Can you try adding trailingSlash: true, in the next.config.js file and see if that has any effect on the issue described here?

Thanks!

talyh commented 3 years ago

@Khairo-kh - I just looked at our next.config.js and it already has trailingSlash: true.

talyh commented 3 years ago

@Khairo-kh @evcodes - Wondering if you have an update on this? Anything I can help with?

Khairo-kh commented 3 years ago

Hey @talyh - Sorry for the delay! unfortunately, I have not had any success reproducing this issue. I can successfully get the custom state data (and sign-in) both in the local dev environment and with a production build deployed to Vercel. Please let me know if you would like to see the setup I have, I can add you to the test repo I used to try and reproduce this.

On a side note, you can enable Amplify logging without using a third-party package. Simply add window.LOG_LEVEL = 'DEBUG'; in your code and you will see the logs in the browser console. See this for reference.

talyh commented 3 years ago

Hi @Khairo-kh Thanks for looking into the setup. As far as I can tell, there's nothing in our configuration (neither in Vercel nor in AWS) that'd lead to prod behave differently (though there are so many moving pieces that it's hard to say for sure). I did open a ticket for our AWS support and they confirm that both the staging and prod pools are identical in their configuration (sure, different app ids, different domains, etc, but the same setup). So I'm not sure where else to look in terms of configuration differences.

Maybe a different approach is: Can you help me understand what are the conditions that'd trigger these errors we saw on the production environment?

(Also, thanks for pointing me to the logger! I missed that and indeed it'd have been really helpful during that investigation)

chrisbonifacio commented 2 years ago

Hi @talyh just wanted to check in with you and ask if you are still experiencing this issue and in need of assistance?

talyh commented 2 years ago

@chrisbonifacio - Yes, this is still an issue

https://github.com/aws-amplify/amplify-js/issues/8594#issuecomment-893975315 provides the latest information requested by the AWS team, but I haven't seen any further action on it

7ruth commented 2 years ago

Just leaving this here for anyone else that might be trying to pass custom state as their users are signing in with federatedSignIn. I wasn't able to find many clues on stackoverflow, in issues or elsewhere.

Pass custom state like this:

    Auth.federatedSignIn({
                      provider: CognitoHostedUIIdentityProvider.Facebook,
                      customState: "hello"});

'hello' will be encoded in the url as sign in redirects are taking place, such as:

http://localhost:4539/?code=f9327f9c-93d4-42a1-920f-b02700fa1e2d&state=SNnqsphf8sDUyTTCyokuIfeI9NRnzh7f-4c49535341#_=_

specifically in the 'state' parameter, after the dash ("-"), corresponding to "4c49535341#=" in the example above

Once front end redirect are done, you will have to read in the query parameter, split is according to the dash ("-") and pass it to the urlSafeDecode such as:

import { urlSafeDecode } from "@aws-amplify/core";

urlSafeDecode(router.query.state.split("-")[1]))

chrisbonifacio commented 2 years ago

Hi 👋 Closing this due to inactivity and being unable to reproduce the issue.

elorzafe commented 2 years ago

For clarification, you can read Social Sign In documentation on how to read customState after the user is signed in.

You should listen for Hub instead of parsing the URL.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there hasn't been any recent activity after it was closed. Please open a new issue for related bugs.

Looking for a help forum? We recommend joining the Amplify Community Discord server amplify-help forum.

aws-amplify / amplify-js