SharePoint / sp-dev-docs

SharePoint & Viva Connections Developer Documentation
https://docs.microsoft.com/en-us/sharepoint/dev/
Creative Commons Attribution 4.0 International
1.24k stars 1k forks source link

getToken() intermittently hangs and fails to resolve/reject #4892

Closed TheIronCheek closed 3 years ago

TheIronCheek commented 4 years ago

Category

Expected or Desired Behavior

The promise returned by AadTokenProvider.getToken() should always resolve or reject. In other words, when a token can't be retrieved for whatever reason, I should get something back that I can catch.

Observed Behavior

getToken() is occasionally hanging - never resolving or rejecting so I can't catch an error or figure out what's happening. The .then() under getToken() just never runs and my web part stalls.

Steps to Reproduce

I'm trying to access a .NET Core Web API secured with AAD from my custom web part. I initially used the instructions found here but AadHttpClient.get() was occasionally stalling so I decided to get the token using AadTokenProvider.getToken() and manually add it to a HttpClient.get() but my issue didn't go away. I was however able to confirm that getToken() is where it hangs rather than an issue with the API. Here's my code:

this.context.aadTokenProviderFactory.getTokenProvider()
    .then((provider) => { 
        return provider.getToken('<Client ID for my API>', false) //THIS IS WHERE IT HANGS
            .then((token) => {
                this.context.httpClient
                    .get('https://www.example.com/api/myAPI/', AadHttpClient.configurations.v1, {
                        headers: [
                            ['accept', 'application/json'],
                            ['Authorization', 'Bearer ' + token]
                        ]
                    })
                    .then((res: HttpClientResponse): Promise<any> => {
                        return res.json();
                    })
                    .then(data => {
                        //process data
                    }, (err: any): void => {
                        this.context.statusRenderer.renderError(this.domElement, err);
                    });
            }, (rejection) => {
                this.context.statusRenderer.renderError(this.domElement, "Token request rejected. " + rejection);
            });
    })
    .catch((err) => {
        this.context.statusRenderer.renderError(this.domElement, "Failed to get token provider.");
    });

I noticed that it seems to hang after a period of inactivity like maybe it's failing to refresh a stale token or something. I'd expect the rejection code to execute but nothing beyond getToken() runs. But if I refesh the page a couple times, it starts working again.

This seems to be related to issue #914 reported on pnp/pnpjs.

Edit: I discovered that this is a lot easier to reproduce in Chrome. (I haven't seen it occur in IE and I've only seen it once or twice in Firefox although I can't seem to reproduce in FF at the moment. Edge gets the token but fails the API request for reasons I can't explain). In Chrome, it struggles every time it needs to grab a token. So, for example, it hangs the first time you load the page. If you refresh a few times, it starts working. Then if you leave the page open for a period of time (long enough for the token to expire) and refresh the page, it hangs again.

Edit 2: Per the conversation below, we can reproduce the issue when the custom web part is on the same page as a document library and a planner or events web part and it works fine when on a page by itself. There may be some sort of conflict with token retrieval/sharing between web parts.

msft-github-bot commented 4 years ago

Thank you for reporting this issue. We will be triaging your incoming issue as soon as possible.

TheIronCheek commented 4 years ago

I spotted the following message in the console when it errored referring to mytenant.sharepoint.com/_forms/spfxsinglesignon.aspx:

Blocked script execution in '<URL>' because the document's frame is sandboxed and the 'allow-scripts' permission is not set.

andrewconnell commented 4 years ago

@TheIronCheek said:

I noticed that it seems to hang after a period of inactivity

What do you mean by "period of inactivity?... you mean the page sitting open for a while? If so, how long is "period of inactivity"?

TheIronCheek commented 4 years ago

Long enough for the token to expire... a half hour maybe?

TheIronCheek commented 4 years ago

The problem seems to happen most consistently in Chrome. It looks like it happens whenever it needs a fresh token. So, the very first time the user hits the page, it fails. If you refresh the page a few times, it'll finally get a token. And then it fails again when that token expires and it tries to get a new token.

andrewconnell commented 4 years ago

Any chance you have a sample project you can share (via public GH repo) to repro it? Will help speed up the repro to escalate the issue to engineering...

TheIronCheek commented 4 years ago

I'll throw something together for you.

andrewconnell commented 4 years ago

Sounds like after obtaining a token, SPFx just keeps using the same, but not handling the response if the token is expired. Have you inspected the request with Fiddler / network tab in the browser? When the request is made, I expect you'll see a response saying something like the token has expired... my theory is that this response isn't handled by the SPFx API and thus, they the promise is never resolving...

TheIronCheek commented 4 years ago

There are hundreds of requests being made on the page but watching the Network tab on a successful run, here's what I'm gleaning that looks pertinent:

On a failing run, here's what I get:

Based on that, it looks like a token is being correctly retrieved since that Location header value in the response from https://login.microsoftonline.com has the access_token URL parameter set to a value. But for whatever reason, that token never makes it back to my web part so it can call my API.

Like I said, there are hundreds of requests that run here so it's tough to sift through and figure out what's relevant. If there's something specific I should look for, let me know.

TheIronCheek commented 4 years ago

I updated the OP with browser specific information.

TheIronCheek commented 4 years ago

I threw together a sample web part:

https://github.com/TheIronCheek/SampleWebPart

FlorianLabranche commented 4 years ago

I have the exact same issue as you.

@andrewconnell Do I need to open a new issue ? Or submit a msft support request to speed things up ?

TheIronCheek commented 4 years ago

@FlorianLabranche - I'm glad I'm not the only one. If you figure out what's going on, please - for the love of all that is holy - come back here and update me. I'll be sure to do the same. I have a ticket open with support too and I was told they're "escalating" the issue and that I'd for sure get a call back today. Still waiting.

TheIronCheek commented 4 years ago

Googling around today, I found this resolved issue from late October/early November.

It references the same message I get:

Blocked script execution in '<URL>' because the document's frame is sandboxed and the 'allow-scripts' permission is not set.

But in that situation, it only affected extensions and not web parts. Could my issue be tangentially related?

OliverZeiser commented 4 years ago

I am seeing simmilar issues. Might be something else, but maybe it helps to track down the issue... For me this only happens if thre is an OOTB document libraray/list webpart on the page or a planner webpart. I think those ootb webparts are also getting a token for MS graph using the underlying adal infrastructure and interfering with the custom webparts. Looking at the tokens in the session storage of the browser, you can see that things get messed up. Removing those ootb webparts and refreshing the page, i can get my token without any problems over and over again...

TheIronCheek commented 4 years ago

@OliverZeiser - I put my custom web part on a blank page with no other web parts to test a couple weeks ago and the problem didn't go away.

Although when I try it now, it seems to work fine on its own page...

OliverZeiser commented 4 years ago

Well now try to put a doc lib and planner webpart on the same page and try again a few times...

TheIronCheek commented 4 years ago

@OliverZeiser - I don't have the planner web part but I was able to reproduce the issue with a doc library and event web part (both of which are also on my main page).

I think we're onto something here.

andrewconnell commented 4 years ago

@TheIronCheek I've checked multiple samples I use to demonstrate this and all seem to be working as expected. They are:

All of these use the same underlying infrastructure to obtain a token. I tested them individually & combined being the only web part on the page, with all of them on the page, and also with other OOTB web parts on the page (including the Document Library web part). I've been refreshing the pages and hitting them every other day and none of them are breaking. I'm trying, but I can't repro what you guys are saying.

I see two things in your code... one isn't terribly important, the other I suspect is your issue. I am not seeing the stalling out using the AadHttpClient.getClient()... I see you switched to AadTokenProvider.getToken()... that won't fix anything as you're now just doing the same thing the AadHttpClient does for you.

However, I see you're trying to reference your endpoint by the client ID... a GUID. That was supported in the dev previews, but that's not supported or how you're supposed to do it how. That value should be the endpoint of the service you are trying to reach. Look at the definition of the underlying getToken() method:

/**
 * Fetches the AAD OAuth2 token for a resource if the user that's currently logged in has
 * access to that resource.
 *
 * The OAuth2 token should not be cached by the caller since it is already cached by the method
 * itself.
 *
 * @param resourceEndpoint - the resource for which the token should be obtained
 * @param useCachedToken - Allows the developer to specify if cached tokens should be returned.
 * An example of a resourceEndpoint would be https://graph.microsoft.com
 * @returns A promise that will be fullfiled with the token or that will reject
 *          with an error message
 */
getToken(resourceEndpoint: string, useCachedToken?: boolean): Promise<string>;

Notice it's the endpoint... and the reference is the URL of the endpoint.

The AadHttpClient method says the same:

    /* Excluded from this release type: __constructor */
    /**
     * Returns an instance of the AadHttpClient that communicates with the current tenant's configurable
     * Service Principal.
     * @param resourceEndpoint - The target AAD application's resource endpoint.
     */
    getClient(resourceEndpoint: string): Promise<AadHttpClient>;

For reference, here's how I'm calling a custom service...

const NASA_APOLLO_MISSION_ENDPOINT_URI: string = 'https://voitanos-secure.azurewebsites.net';

export class MissionService {

  constructor(private aadHttpClientFactory: AadHttpClientFactory) { }

  public getMissions(): Promise<IMission[]> {
    return new Promise<IMission[]>((resolve, reject) => {
      this.aadHttpClientFactory
        .getClient(NASA_APOLLO_MISSION_ENDPOINT_URI)
        .then((client: AadHttpClient) => {
          const endpoint: string = `${NASA_APOLLO_MISSION_ENDPOINT_URI}/api/nasa-apollo-missions`;
          client.get(endpoint, AadHttpClient.configurations.v1)
            .then((rawResponse: HttpClientResponse) => {
              // verify successful response
              if (rawResponse.status === 200) {
                return rawResponse.json();
              } else {
                throw new Error('Error occurred when retrieving missions.');
              }
            })
            .then((jsonResponse: IMission[]) => {
              resolve(jsonResponse);
            })
            .catch((error) => {
              reject(error);
            });
        });
    });
  }
}

@OliverZeiser said:

I think those ootb webparts are also getting a token for MS graph using the underlying adal infrastructure and interfering with the custom webparts.

All of that stuff uses the SPO infrastructure to get a token which today still uses ADAL, including our custom stuff. The plan is to move to MSAL JS but ETA is unknown at this time. But that will be transparent to us as that's handled by the SPFx API.

OliverZeiser commented 4 years ago

@andrewconnell The same thing happens with MSGraphClient. And for me it looks like a timing issue and as such it is quite hard to reproduce. But here is an example what happens in the session storage: When everything is working fine, the session storage for ADAL looks like this. This is with just one custom webpart requesting a graph token. adal_single_ok

When adding the document library webpart to the page and hitting F5 a couple of times (after clearing the session storage) I'll eventually end up with this (note the invalid_state here): adal_multiple_notok As you can see there are not two tokens, but just one, the one from the document library webpart. My custom webpart failed when getting the token. This can be the other way around as well. Thats why I am guessing it is a timing issue when multiple components request a token more or less at the same time.

What it should look like and also does look like when I add one webpart first, wait until it has a token and then add another webpart to the page without clearing session storage is this: adal_multiple_ok Again this is a hint for me that it is probably a timing issue.

I know that this is very hard to reproduce. In my case it was easier to reproduce when adding an extension that requests a token than a second webpart. Again I am guessing it is due to the fact of the loading order for the components and since it is beeing related to a timing issue. The more components you have on your page requesting differnt kinds of tokens, the easier it gets to reproduce it. Just make sure to clear the session storage all the time. Once you have a valid token things will be working fine for quite some time....

TheIronCheek commented 4 years ago

Yeah, the easiest way for me to reproduce it is to add my custom web part, a document library, and a 3rd OOTB web part like the Events web part and then open the page in an incognito window so the cookie isn't retained. A doc library and custom web part alone wasn't enough to trigger the issue.

TheIronCheek commented 4 years ago

@andrewconnell - When I switch the client ID to my API's endpoint in getToken(), my API returns a 401 Unauthorized message. I also get a 401 if I use AadHttpClient. Is it possible I have something configured incorrectly in Azure?

FlorianLabranche commented 4 years ago

I can only confirm the above comments.

In my case it's an extension but it behaves the same. FYI, it's mainly used on Doc library pages. As @OliverZeiser described it, in some case, SP get the Adal token for OOB SP calls but not for my API.

I also referenced my API with its client ID. Changing to the endpoint url give me this exception : image

How can it work via Id (some times) but not via the endpoint ? If the app was granted consent, it should work in both case. Right ?

JasonPan commented 4 years ago

Joining the conversation - we're getting a somewhat similar issue above (#5003), using MSGraphClient. For us we have a dev and prod tenant, and this issue has appeared on both. For our dev tenant, it's only ever happened once for a few days across multiple accounts and then the issue stopped appearing since early November.

It's just started appearing for our prod after our last deployment (we've had multiple deployments for the last month and no reported issues). This is what my session storage looks like

Screen Shot 2019-12-10 at 11 42 46 pm

Has anyone been receiving this kind of error as well? I agree with @OliverZeiser that these issues do suggest that it's timing related, but I can't seem to reproduce the issue at all on our dev environment

guillaume-kizilian commented 4 years ago

@andrewconnell For me it is also not working with getClient().

image

Last log I see is "Almost there" then no news from Then or Catch.

andrewconnell commented 4 years ago

@guillaume-kizilian That's expected. You aren't specifying anything in the getClient() method. That requires the URI of the resource you want the token for.

andrewconnell commented 4 years ago

@FlorianLabranche your error that you show in your comment https://github.com/SharePoint/sp-dev-docs/issues/4892#issuecomment-563937497 indicates you are trying to access a resource that is secured with an AAD app in a different tenant. If this is a multitenant AAD app, you must first consent to it within your app PRIOR to requesting an access token. That will create the service principal in your tenant which will establish the "link" to the AAD app defined in the other tenant.

This is a configuration issue, rather a code issue, in your case.

More: https://docs.microsoft.com/en-us/sharepoint/dev/spfx/use-aadhttpclient-enterpriseapi-multitenant

guillaume-kizilian commented 4 years ago

@andrewconnell this is not the framework method, it is my override which explains why i don't need a parameter.

Here is my implementation : image

Also i should still get an error and this is not where it stop but right after (I have the almost here log showing up)

andrewconnell commented 4 years ago

@guillaume-kizilian said:

this is not the framework method, it is my override which explains why i don't need a parameter.

well that wasn't clear from your first comment... providing full context in an error report helps :)...

but from your most recent comment, you refer to this as the client ID. What is the value of that? If it's a GUID, that's not correct. It should be the URI of the resource, not a GUID.

guillaume-kizilian commented 4 years ago

@andrewconnell Yes, sorry :). It took sometimes but i've finally found out why. (Its hard when the process just hangs). I forgot the webApiPermissionRequests of my app registration in package-solution.json. This was granted in Admin center for another SP Add-in, maybe this is why i got no error.

Not sure if it's linked to the original issue.

JasonPan commented 4 years ago

@andrewconnell I just noticed #3968, seems like it could be related? Specifically this comment look mores like the same error msg I was getting, but perhaps also related to this hanging issue?

andrewconnell commented 4 years ago

@guillaume-kizilian said

I forgot the webApiPermissionRequests of my app registration in package-solution.json. This was granted in Admin center for another SP Add-in, maybe this is why i got no error.

In that case you would have received a 401 response.

andrewconnell commented 4 years ago

@JasonPan My understanding was #3968 was an issue with the admin console and dealing with permission grant requests. While they fall in the same general "bucket" with AzureAD secured endpoints, getting a token is not related IMHO.

guillaume-kizilian commented 4 years ago

@andrewconnell you're right. it only worked for 15 minutes, I don't know why it did and why it stopped. On the other hand my other webpart using this system with exactly the same parameters works.

TheIronCheek commented 4 years ago

@andrewconnell - Why would I be getting a 401 after changing from the GUID to the endpoint URI?

andrewconnell commented 4 years ago

@TheIronCheek Only if SPO has not consented permission to that resource. You can verify this by inspecting the access token with a tool such as https://jwt.io

FlorianLabranche commented 4 years ago

@andrewconnell It's note a multi-tenant configuration in my case. But the exception was from a bad configuration indeed. The App ID URI is in format ap://[Application ID] and does not match the endpoint URL. I still have a CORS issue caused by a bad Redirect URI (my guess). But the admin is out of office til mid-january.

I'll try a repro on another tenant to see if using App ID URI fixes the token issue. I think it might does the job because the requests are handled differently when called with App ID URI or Application ID.

andrewconnell commented 4 years ago

@FlorianLabranche said:

I'll try a repro on another tenant to see if using App ID URI fixes the token issue.

App ID (aka client ID) != App URI... I might be picky here, but that's an important distinction.

Using the app ID was supported in the dev preview, but it's not supported today. It might still work, but it's not the right way to do. The correct way to obtain an access token is to request it by the app's URI.

guillaume-kizilian commented 4 years ago

@andrewconnell said :

@FlorianLabranche said:

I'll try a repro on another tenant to see if using App ID URI fixes the token issue.

App ID (aka client ID) != App URI... I might be picky here, but that's an important distinction.

Using the app ID was supported in the dev preview, but it's not supported today. It might still work, but it's not the right way to do. The correct way to obtain an access token is to request it by the app's URI.

Does this mean that this doc is not up to date ? https://docs.microsoft.com/en-us/sharepoint/dev/spfx/use-aadhttpclient-enterpriseapi-multitenant#consume-enterprise-api-secured-with-azure-ad-from-the-sharepoint-framework image

When i switch the guid to the app URI it just stop working on both extensions. Is there more directives to make it work ?

Thanks,

SuleymanA commented 4 years ago

I am seeing the same issue at our tenant. A isolated SPFX webpart that worked fine till today (received first complaint today) now stops with the following message: "Token Renewal Failed - Description : Token renewal operation failed due to timeout". We are using the AADHttpClient to make a call to a Azure Function like this: this.context.aadHttpClientFactory .getClient(this.applicationIDURI) .then((client: AadHttpClient): void => { client.get(${this.functionAppURL}, AadHttpClient.configurations.v1)

I am going to report this to premier support. Service is so bad lately...

FlorianLabranche commented 4 years ago

@FlorianLabranche said:

I'll try a repro on another tenant to see if using App ID URI fixes the token issue.

App ID (aka client ID) != App URI... I might be picky here, but that's an important distinction.

Using the app ID was supported in the dev preview, but it's not supported today. It might still work, but it's not the right way to do. The correct way to obtain an access token is to request it by the app's URI.

Yes I get that. App ID URI as it's called on the app registration in AAD: image

If the app registration is created when securing the API with AAD in Express (create) mode, it will match the API endpoint (...azurewebsites.net). That's where I get confused when following the doc because mine was manually created.

andrewconnell commented 4 years ago

@guillaume-kizilian said:

Does this mean that this doc is not up to date ?

I'm double-checking with engineering on the App ID (GUID) vs. endpoint before I comment on this. My understanding as I've explained above is that you should use the App URI, not the GUID as the SDK docs state, and if that's correct I'll bug & update the doc. Stay tuned...

When i switch the guid to the app URI it just stop working on both extensions. Is there more directives to make it work ?

Can you give more detail than that? "It just stop working" isn't much to go on... error response, raw HTTP response when inspecting the request, etc...

andrewconnell commented 4 years ago

@guillaume-kizilian following up on my last comment https://github.com/SharePoint/sp-dev-docs/issues/4892#issuecomment-567040079, confirmed with engineering the supported approach is to use the Azure AD app's URI.

Will bug & update the referenced doc...

OliverZeiser commented 4 years ago

This thread seems to mix up multiple issues now. The one desribed by the starter of this thread and also by me still exists and is not limited to AadHttpClient but can also be reproduced with MSGraphClient. For me it happens especially if I add an ApplicationCustomizer to the top that is makeing calls to the graph and combine it with OOTB webparts on the page like calendar, planner and document library webpart. I can reproduce it on multiple tenants and with multiple solutions. So the issue still exists and can't be due to a wrong Azure AD URI

TheIronCheek commented 4 years ago

@andrewconnell - I'm looking at the decoded access token and I don't see anything jumping out to make me think it's invalid.

When I use the Client ID for getToken() instead of the API's endpoint URI, the token is basically identical, just with the "aud" value set to the my API's Client ID.

Where do I go from here?

TheIronCheek commented 4 years ago

@andrewconnell - This documentation uses the Client ID as well.

andrewconnell commented 4 years ago

@TheIronCheek Can you share the token? If necessary, just remove the parts that would make it useful (like the last section of GUIDs or the part of the tenant in the UPN)...

@TheIronCheek said:

This documentation uses the Client ID as well.

Thanks for calling that out. Just bugged it #5069 as the other doc is already bugged and on my backlog to fix. I've confirmed with engineering that ClientID is 100% not supported.

TheIronCheek commented 4 years ago

No problem. I should be able to get that to you once I'm back in the office Thursday morning.

TheIronCheek commented 4 years ago

@andrewconnell - Here's my decoded token when I use the API's endpoint URI. My API returns a 401 Unauthorized message.

Capture

AsunSanLo commented 4 years ago

Hello, I'm following this issue because it seems to be related with the #5003 in SPFx calling Graph API. @andrewconnell In that case, do we follow this issue or is it a different one?

In my case, the problem happens when calling the Graph API from an SPFx webpart inside a tab from Teams. The Graph call is returning the error "Token renewal operation failed due to timeout" even after deleting the session and local storage. However, this same webpart is correctly working inside SharePoint, it's just in Teams where it fails.

The problem appeared on November and it suddenly started to work again after a few days until the past 23th of December, when it started to give again the error and has not recovered from it.

Thank you in advance!