SharePoint / sp-dev-docs

SharePoint & Viva Connections Developer Documentation
https://docs.microsoft.com/en-us/sharepoint/dev/
Creative Commons Attribution 4.0 International
1.25k stars 1.01k forks source link

getToken() intermittently hangs and fails to resolve/reject #4892

Closed TheIronCheek closed 3 years ago

TheIronCheek commented 5 years ago

Category

Expected or Desired Behavior

The promise returned by AadTokenProvider.getToken() should always resolve or reject. In other words, when a token can't be retrieved for whatever reason, I should get something back that I can catch.

Observed Behavior

getToken() is occasionally hanging - never resolving or rejecting so I can't catch an error or figure out what's happening. The .then() under getToken() just never runs and my web part stalls.

Steps to Reproduce

I'm trying to access a .NET Core Web API secured with AAD from my custom web part. I initially used the instructions found here but AadHttpClient.get() was occasionally stalling so I decided to get the token using AadTokenProvider.getToken() and manually add it to a HttpClient.get() but my issue didn't go away. I was however able to confirm that getToken() is where it hangs rather than an issue with the API. Here's my code:

this.context.aadTokenProviderFactory.getTokenProvider()
    .then((provider) => { 
        return provider.getToken('<Client ID for my API>', false) //THIS IS WHERE IT HANGS
            .then((token) => {
                this.context.httpClient
                    .get('https://www.example.com/api/myAPI/', AadHttpClient.configurations.v1, {
                        headers: [
                            ['accept', 'application/json'],
                            ['Authorization', 'Bearer ' + token]
                        ]
                    })
                    .then((res: HttpClientResponse): Promise<any> => {
                        return res.json();
                    })
                    .then(data => {
                        //process data
                    }, (err: any): void => {
                        this.context.statusRenderer.renderError(this.domElement, err);
                    });
            }, (rejection) => {
                this.context.statusRenderer.renderError(this.domElement, "Token request rejected. " + rejection);
            });
    })
    .catch((err) => {
        this.context.statusRenderer.renderError(this.domElement, "Failed to get token provider.");
    });

I noticed that it seems to hang after a period of inactivity like maybe it's failing to refresh a stale token or something. I'd expect the rejection code to execute but nothing beyond getToken() runs. But if I refesh the page a couple times, it starts working again.

This seems to be related to issue #914 reported on pnp/pnpjs.

Edit: I discovered that this is a lot easier to reproduce in Chrome. (I haven't seen it occur in IE and I've only seen it once or twice in Firefox although I can't seem to reproduce in FF at the moment. Edge gets the token but fails the API request for reasons I can't explain). In Chrome, it struggles every time it needs to grab a token. So, for example, it hangs the first time you load the page. If you refresh a few times, it starts working. Then if you leave the page open for a period of time (long enough for the token to expire) and refresh the page, it hangs again.

Edit 2: Per the conversation below, we can reproduce the issue when the custom web part is on the same page as a document library and a planner or events web part and it works fine when on a page by itself. There may be some sort of conflict with token retrieval/sharing between web parts.

TazzyMan commented 4 years ago

Is there an update already?

TheIronCheek commented 4 years ago

@andrewconnell - I got the API working. In my API's Startup.cs, I was specifying the expected audience as the Client ID which is why it was returning a 401 when using the endpoint URI. I fixed that but the bug is still present.

In short, updating my code to use the endpoint URI instead of the client ID did not fix the bug I'm reporting.

It still hangs whenever it tries to get a new token and the <promise> never returns. If I refresh the page several times, I'll eventually get the <promise> to return and my web part works as expected.

TheIronCheek commented 4 years ago

I think @OliverZeiser may have nailed the problem when he mentioned the ADAL infrastructure being a problem. I'm getting the same invalid_state error he's getting.

invalid state

I've been reading about ADAL JS limitations when used with the SharePoint Framework. @andrewconnell - Are these limitations present when using AadTokenProvider as well?

OliverZeiser commented 4 years ago

I think those (most of them...) limitations should not be there when working with the authentication flow provided by SPFx like with AadHttpClient. That is one reason why all that was done by MS and why we don't have to use ADAL JS ourselves. But I am also waiting for a statement from Microsoft here. I would love to know at least if this is a known bug and if there is work beeing done. Maybe @patmill or someone else from the team can provide a quick update, so we know if there is anything we need to do on our side besides waiting ;)

TheIronCheek commented 4 years ago

@OliverZeiser - That was kind of the vibe I got from the docs too, that not having to deal with token issues was the whole reason for using AadHttpClient instead of a manual ADAL JS implementation. But here we are with token issues so maybe there's something we're missing.

I'm also curious if anyone with the issue is using domain isolation. I am not - mainly because the web part stopped working completely when I set "isDomainIsolated": true (which is an issue I could bring up if I needed to...) but also because messing with responsiveness issues from loading content in an iframe is a pain. I'm only using user_impersonation for my API and User.Read from Graph which isn't a huge deal for us in terms of what escapes if another web part sniffs my token.

But the reason I ask is that I'm wondering if loading this web part in an iframe would shield it from whatever conflict is happening between it and the OOTB web parts...

andrewconnell commented 4 years ago

Not ignoring this thread, it just heated up when many (including me) took time off at the end of the year. And now, trying to dig out from the backlog. I'll catch up, but @TheIronCheek's last comment stuck out...

Isolated web parts get their own Azure AD app so you need to grant the permission(s) request(s) to that app. If all you did was change the setting and redeploy, then that's expected it wouldn't work until the permission was granted.

TheIronCheek commented 4 years ago

@andrewconnell - I probably messed this up early in my testing process. I remember early on when I was making the web part that a strange app had been created in AAD (named after my app) along with the SharePoint Online Client Extensibility Web Application Principal and SharePoint Online Client Extensibility Web Application Principal Helper app registrations. When things weren't working I tried deleting all the permissions and app registrations so I could start over.

Now, when I turn on domain isolation and redeploy, I don't get a new app registration in Azure. So I'm kind of stuck with no way to approve anything.

ghost commented 4 years ago

Hi @TheIronCheek

I was able to repro your issue following the steps outlined in your original post. I've identified the issue (race condition with Document Library component trying to do Auth) and I'm working on a fix now. Will update the thread when

  1. The issue is fixed
  2. The fix has been rolled out 100% PROD.

Sorry about the delay

OliverZeiser commented 4 years ago

That is good news. Thank you for the update! We are looking forward for this fix. Hopefully this fix is not just for the document library component, because the same thing happens with the planner webpart or group calendar webpart (and maybe other components as well...)

ghost commented 4 years ago

@OliverZeiser The fix is in the underlying token acquisition code, so it should handle any issues with both 1st and 3rd party components 😉

TheIronCheek commented 4 years ago

Thanks for your help @lahuey and @andrewconnell !

FlorianLabranche commented 4 years ago

@lahuey Great news !

@OliverZeiser The fix is in the underlying token acquisition code, so it should handle any issues with both 1st and 3rd party components 😉

On my side, the issue is between an extension and a document library page itself (Forms/AllItems.aspx). I hope the inner "doc lib component" uses the same underlying code as well !

guillaume-kizilian commented 4 years ago

Should it fix the spfxsinglesignon.aspx errors ? 2020-01-09 17_00_14-Sans titre - paint net v4 2 8

ilkkalehto commented 4 years ago

We have the exact same spfxsinglesignon.aspx errors when using SPFx Teams tabs with browser. However, no issues with Teams desktop client.

v-pajorg commented 4 years ago

Hi @TheIronCheek

I was able to repro your issue following the steps outlined in your original post. I've identified the issue (race condition with Document Library component trying to do Auth) and I'm working on a fix now. Will update the thread when

  1. The issue is fixed
  2. The fix has been rolled out 100% PROD.

Sorry about the delay

@lahuey do you have an ETA for this fix?

rayhogan commented 4 years ago

This issue still exists for us when "User Consent" is disabled in Azure Active Directory. image

When set to No, you cannot even create a direct website tab link to a SharePoint site: image image image

The tab is added, but doesn't load correctly: image And the console is showing x-frame-options errors: image

This also means that AadHttpClient auth in a Teams app breaks, as it also relies on Microsoft Teams being able to render the SPFx app in the iFrame, but as it's hosted in an iFrame, it runs into the same x-frame-options issue: image

So it looks like when this AAD setting is disabled, there are stricter x-frame-option settings applied to SharePoint which means they no longer execute correctly in a Team Tab. (Microsoft are going to be setting this setting to No by default on all tenants shortly, so this is going to be a huge impact for everyone using SPFx apps in Teams).

ghost commented 4 years ago

@rayhogan Can you (temporaily) disable their browser cache, and try again. That should resolve the issue you're seeing.

@guillaume-kizilian These errors are from SPO running an experiment with MSAL.js. They are safe to ignore and will not affect your component's ability to fetch tokens. We're working with the MSAL team to address these "ignorable" errors.

@v-pajorg About a month, but hopefully less.

@FlorianLabranche yes, it's the same component behind the scenes. We need to update our calling pattern to ADAL.js (used behind the scenes) in order to avoid this race condition.

rayhogan commented 4 years ago

@lahuey - Disabling the browser cache has no impact. I've tested every possible scenario for the past week and the only thing that gets it working again is changing this setting back to yes: image

When it is set to yes, I can see through Fiddler and the browser's Network tab that there are no x-frame-options on the SharePoint call, but when it is set to **No***, SharePoint has an x-frame-options header of 'SameOrigin' which prevents it being called within the Teams iFrame.

The issue for us is that Microsoft has recommended that this setting is set to No, and is pushing a change to all tenants to set this to No by default.

maskati commented 4 years ago

For us this issue manifests when loading an SPFx tab in Teams as described in #5116. The root cause seems to be that the SharePoint token acquisition logic relies on loading spfxsinglesignon.aspx from SharePoint within a Teams application iframe. This fails because spfxsinglesignon.aspx usually (not always) returns X-Frame-Options = SAMEORIGIN header, and as a result the iframe loading is blocked by browser policy and token acquisition fails.

An example flow that fails:

In addition the result of spfxsinglesignon.aspx is also cached, so once you get a response with X-FRAME-OPTIONS: SAMEORIGIN that same response is used for subsequent token acquisition flows resulting in the same error.

You can verify that the issue is resolved by temporarily (!) installing the following Google Chrome extension, which causes the browser to ignore X-Frame headers and load the iframe contents regardless of the header. https://chrome.google.com/webstore/detail/ignore-x-frame-headers/gleekbfjekiniecknbkamfmkohkpodhe

What I find strange is that spfxsinglesignon.aspx does not consistently return the X-FRAME-OPTIONS header, and never returns it when you send the Cache-Control=no-cache header. This is why disabling browser caching (e.g. Google Chrome dev toolbar -> network -> disable cache) fixes the problem.

Running the following PowerShell:

1..50|%{invoke-webrequest "https://microsoft.sharepoint.com/_forms/spfxsinglesignon.aspx"|select @{n="Cache-Control";e={$_.headers."Cache-Control"}},@{n="X-Frame-Options";e={$_.headers."X-Frame-Options"}}}

results in:

Cache-Control X-Frame-Options
------------- ---------------
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public        SAMEORIGIN
public
public
public
public
public
public
public
public        SAMEORIGIN
public
public        SAMEORIGIN
public
public
public
public
public
public        SAMEORIGIN
public
public
public
public
public
public
public
public
public
public
public
public
public        SAMEORIGIN
public
public
public        SAMEORIGIN
public
public        SAMEORIGIN

The same when disabling cache:

1..50|%{invoke-webrequest "https://microsoft.sharepoint.com/_forms/spfxsinglesignon.aspx" -headers @{"Cache-Control"="no-cache"}|select @{n="Cache-Control";e={$_.headers."Cache-Control"}},@{n="X-Frame-Options";e={$_.headers."X-Frame-Options"}}}

results in:

Cache-Control X-Frame-Options
------------- ---------------
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
public
rayhogan commented 4 years ago

Actually, disabling cache does cause it work: image

But this isn't a solution as our customers access to dev tools is blocked by policy.

FlorianLabranche commented 4 years ago

@maskati @rayhogan I'm also concerned about what your saying but could you track the issue you're describing in another ticket as it's not directly related to the original post and leave this thread to track down update about the getToken() issue and fix ?

I'm pretty sure it will also help the dev team to track your issue more efficiently.

maskati commented 4 years ago

I would say #5116 could be the correct issue, however it was closed it as a duplicate of this.

rayhogan commented 4 years ago

@maskati / @FlorianLabranche - Yes I had an issue opened but it was closed as this ticket was considered to be a duplicate, but I also think that, while the errors are the same, the root cause is different so it could potentially warrant a separate ticket.

FlorianLabranche commented 4 years ago

Ok. As you said, your token renewal issue seems related to the AAD settings and X-FRAME option. But it could also be related to the one on this thread.

Let's see what the gurus say !

bealsao commented 4 years ago

@rayhogan I am seeing this exact same issue except my "user consent" option is enabled. This was working in December and stopped working in January if that helps narrow down what might have changed. Also my code base hasnt changed since November.

I can also confirm disabling the cache resolves this issue but isn't a scalable solution.

ghost commented 4 years ago

@rayhogan @bealsao could you email me the tenant url and sprequestguid from the "same origin" aspx page request? I haven't been able to repro this issue, but a look into the logs would really help!

My email is lahuey@microsoft.com

bealsao commented 4 years ago

@lahuey It's my dev tenant so Im happy to give you a login and let you see the problem in action if you'd prefer. I'll send you an email with the Tenant URL and if you want a login to reproduce the behavior reply to my email and let me know.

maskati commented 4 years ago

@lahuey you can reproduce this by calling spfxsinglesignon.aspx in any tenant (the page does not require authentication). See my earlier message

Here you can see that without the Cache-Control: nocache request header, 9/10 responses returned the X-Frame-Options response header, and by disabling cache no responses contained the header:

image

michelcarlo commented 4 years ago

I'm having the same issue and debug outputs with domain isolated webparts as a SharePoint app page. Weirdly it works on Edge, but not on Chrome.

ghost commented 4 years ago

@maskati thank you for detailed response. It's quite strange that this ASPX page is only returning the correct response sometimes. I'll update the thread when I have more details/proposed fix.

rayhogan commented 4 years ago

We opened a premium support ticket with MS. They have since acknowledged the issue and have supposedly rolled out a fix to all tenants. I have confirmed it now works on our environments.

bealsao commented 4 years ago

Can confirm its working for me.

SuleymanA commented 4 years ago

I can confirm the same as @rayhogan. We also created a support ticket. They said it was due to a invalid refresh token. It is now fixed in our tenant as well.

TheIronCheek commented 4 years ago

I'm still getting my original issue where my custom web part won't run when on the same page as other OOTB web parts.

The fix that you all are talking about is unrelated to the token issue @lahuey said could take a month or so to fix, correct?

ghost commented 4 years ago

@TheIronCheek correct.

The Cache-Control issue has been resolved already, but the race condition issue is still open. There was a couple other higher priority items that needed to be addressed before I could start fix this issue.

gaikwaduc commented 4 years ago

@lahuey Regarding the issue which I had reported #5164 - I can be confirmed that the issue Refused to display 'https://myorg-app.sharepoint.com/_forms/spfxsinglesignon.aspx... ... ...' in a frame because it set 'X-Frame-Options' to 'sameorigin'. no more exist in the browser. However, still facing an issue on MS Team Desktop client.

TheIronCheek commented 4 years ago

@lahuey - Just touching base on this again since it's been a little over a month. Do we have a new ETA for the fix here? Thanks!

ghost commented 4 years ago

@TheIronCheek The fix is now in review. Should roll out slowly over the next ~3 weeks.

Apologies for the wait. The fix required multiple changes on both the server and client. We're expecting to see an improvement in the performance and reliability of the token provider APIs after the fix rolls out.

Thank you for patience :)

dvrax commented 4 years ago

@lahuey - Any word on the current status? I seem to still be getting this error today. Is there a particular version I need to update to for this fix?

guillaume-kizilian commented 4 years ago

Still not fully functional on my tests (depending on the tenant), I guess this is why the issue is still open.

@lahuey any eta, news, tests on your side ?

Thanks.

ghost commented 4 years ago

@dvrax @guillaume-kizilian

We’re still working through some issues that prevent us from deploying the upgraded token provider APIs. We want to ensure that the upgraded API resolves all of the open issues without causing regression.

devanshuGit commented 4 years ago

@lahuey do we have any ETA for race condition issue in token provider. We are recently using it in SPFX extension using Bot framework and as pointed out after inactivity of around 1 hour (that's when token expires and ideally should fetch refresh token) it is stuck in resolving MicrosoftTeams-image

manjeetjagtap commented 4 years ago

@lahuey I have started seeing this issue on our tenant from the last 2 weeks. We were analyzing this issue and found it was recommended to use Azure AD App URI instead of App AD ( getToken('') ). Initially, I thought this has been happening because I was using App ID, but I tried replacing the app ID with App URI. But still, it has happened. Do I need to create a separate MS ticket for our tenants to get it resolved? Also, is there any workaround on this?

gobigfoot commented 4 years ago

In our tenant, this issue only presents itself when loading the page with the webpart directly (Browser URL, Link, or Bookmark) and only on the first load. I can hit browser refresh and it will work perfectly. Once it is in this failing state all requests to the API will hang.

const CLIENTID = 'api://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx';
// Token Hangs
const provider = await context.aadTokenProviderFactory.getTokenProvider();
const token = await provider.getToken(CLIENTID);
// Fiddler shows a request for the token, but never makes the request to the API and the promise hangs
const client = await context.aadHttpClientFactory.getClient(CLIENTID);
var response = await client.fetch(`${APIPATH}/contacts`, AadHttpClient.configurations.v1, options);
sortegamartin commented 4 years ago

@lahuey do we have any ETA for race condition issue in token provider. We are recently using it in SPFX extension using Bot framework and as pointed out after inactivity of around 1 hour (that's when token expires and ideally should fetch refresh token) it is stuck in resolving MicrosoftTeams-image

I have the same problem

sosandu commented 4 years ago

Also on a few customers from Premier support

MaheshakaMudli commented 4 years ago

We have started facing this issue on our tenant from the past 2 days. The spfx webpart using MsGraphClient throws "Token renewal operation failed" issue on first load in chrome browser. On refresh the issue does not occur and data is returned. Is this something we needs to be fixed from our end or your end.

desaiprerak commented 4 years ago

@lahuey , @andrewconnell Token Renewal issue has started to appear. Reiterating the behavior from the 1st post here, "it hangs the first time you load the page. If you refresh a few times, it starts working. Then if you leave the page open for a period of time (long enough for the token to expire) and refresh the page, it hangs again"

Kindly provide response. It is impacting clients

TazzyMan commented 4 years ago

Just debugged the session in Chrome and Firefox and in the end the following message occurs:

Chrome: A cookie associated with a cross-site resource at https://webshell.suite.office.com/ was set without the SameSite attribute. It has been blocked, as Chrome now only delivers cookies with cross-site requests if they are set with SameSite=None and Secure. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.

Firefox: Content Security Policy: De instellingen van de pagina blokkeerden het laden van een bron op inline (‘script-src’). It's in Dutch, but it says the page setting blocked the loading of a inline script.

In Firefox this could be resolved by changing one flag: security.csp.enable But that is no real fix. On some tenants it's nearly impossible to get a token to use at the moment..

jimmywim commented 4 years ago

I'm also seeing issues with MSGraphClient not fulfilling requests in an SPFX ApplicationCustomizer. It eventually does work when you keep refreshing, but there's no warnings or errors in console. This may be related as it sounds symptomatically similar (and I believe MSGraphClient uses aadTokenProviderFactory under the hood).

I see the auth loop happening to login.microsoftonline.com requesting a token for https://graph.microsoft.com which loops back round to singlesignon.aspx, but the client's call into .api() never completes.

Also: This never occurs when debugging the solution via ?loadSpfx=true..., it works every time when doing that.

EDIT: My issue is down to a custom web part that I've got that uses this auth loop. It's interfering with the OOTB MSGraphClient in SPFX. Without my webpart on the page, the graph bits work fine.

EDIT EDIT: My issue might not be relevant after all, it appears I wasn't using the AadHttpClientFactory correctly in my solution, so the token retrieval was interfering. The instructions here helped me.