Closed advdberg closed 4 years ago
Same issue for me. Frantically waiting for MS to resolve this!
Same issue for me and our customers. Random 500 internal server errors are thrown during provisioning with CSOM. We create a modern teamsite and afterwards we start provisioning with CSOM. We don't even use PnP Provisioning, however we support that as an additional step, but since November 14th the provisioning randomly fails while adding lists or content types or whatever to the site.
BTW: Our customers who still use classic teamsites don't seem to have these issues.
Advisory is now upgraded to 'Incident' in admin center Service Health with following status report:
Current status: We've identified a potential service update which may be the source of the impact. We're reviewing the specific configuration and code changes within the update to isolate the cause of the issue.
So fingers crossed it will be fixed and rolled-out before the weekend so we then have time to fix all failed provisioning jobs. Happily we're in SP online so we don't have to wait for the release of a Service Pack to get it fixed ;-)
This issue is still being actively worked on. No ETA at this point.
Thanks everyone who have already reported the issue from their environment as that helps to speed up the investigation process.
Opened level A Premier ticket, not that I think they could help me specifically, but will update here if there's something new.
In addition to API, we're also seeing issues on OOB list item editing functionality, not a big surprise IMHO considering issue is somewhere deeper.
We had these issues in our Web API using the SharePointOnline CSOM libraries. Upgrading to the latest stable version of PnP CSOM Extensions and the Microsoft CSOM library through Nuget seems to have fixed it. Maybe that'll work for others as well.
I'm getting the HTTP 500 using SharePointPnPCoreOnline NuGet package version 3.15.1911 on a demo tenant created from https://cdx.transform.microsoft.com/ (but it doesn't happen always)
Also, have a premier ticket open regarding this. We also did send some log files and PnP template just in case. Thank you for keeping us informed of the progress @VesaJuvonen . It helps us to communicate this to our clients or organization better. Not an ideal situation, but I'm sure you will find a solution.
We're on latest NuGet SharePointPnPCoreOnline ver. 3.15.1911 and Microsoft.SharePointOnline.CSOM 16.1.19404.12000 and still having random issues @cwdata
We're on latest NuGet SharePointPnPCoreOnline ver. 3.15.1911 and Microsoft.SharePointOnline.CSOM 16.1.19404.12000 and still having random issues @cwdata
issue is probably on the server side, CSOM version doesnt matter
Issues are not related to package versions, no need to continue that discussion π
Premier ticket went as suspected, and we lowered the severity as it is already being investigated. Did send some of our error messages and they added our tenant to the list of affected tenants in the internal ticket of the issue.
Same issue here. Is it still good idea to create a MS ticket?
Current status: We've isolated down the probable causes to a few changes that were recently made to the SharePoint Online service. We're continuing our investigation to confirm our findings and develop a mitigation plan.
This definitely got worse starting a few days ago, however we started seeing this more than two months ago. Hence implemented our own retry logic within the OfficeDevPnP framework (ExecuteQueryRetry).
Maybe there's more to it than "a recent change". Maybe Microsoft can elaborate on what the recent change is??
Issues are not related to package versions, no need to continue that discussion π
Premier ticket went as suspected, and we lowered the severity as it is already being investigated. Did send some of our error messages and they added our tenant to the list of affected tenants in the internal ticket of the issue.
If they want a list, I can add 100+ tenants!!!! ;-)
This definitely got worse starting a few days ago, however we started seeing this more than two months ago.
I agree I've notice the same strange behavior in provisioning during the last two months also. I just wasn't sure where the error has been and retrying has been helped. Before at least.
Current status: We've isolated down the probable causes to a few changes that were recently made to the SharePoint Online service. We're continuing our investigation to confirm our findings and develop a mitigation plan.
This definitely got worse starting a few days ago, however we started seeing this more than two months ago. Hence implemented our own retry logic within the OfficeDevPnP framework (ExecuteQueryRetry).
Maybe there's more to it than "a recent change". Maybe Microsoft can elaborate on what the recent change is??
Agreed, we first hit this issue September 23rd. Definitely occurring a lot more now though.
This definitely got worse starting a few days ago, however we started seeing this more than two months ago. Hence implemented our own retry logic within the OfficeDevPnP framework (ExecuteQueryRetry).
I agree as well, We have been seeing this error since couple of months. Was never able to reproduce when retried.
Am i the only one feeling like this is just getting worse?
Am i the only one feeling like this is just getting worse?
I feel you, bro =(
Would it help to open another ticket to Microsoft? I have the feeling they should have received hundreds because this is huge.
Every single ticket has an impact. Please do not assume that your input or feedback would not have value for Microsoft as all of them do have direct influence on the following actions.
We are still actively working on this, but would absolutely ask all people suffering on this issue to report it through Premier Support or standard tenant admin support channel as each and every submission has an impact getting things resolved.
We do apologize for the inconvenience, but please do keep on reporting the issue if you are experiencing it. Thank you.
Also - if any ISVs do have numerous tenants experiencing the issue, please do use Premier Support or tenant admin support channels to report that. Thank you.
@SandeepVo @DaniCorretja We noticed that some PnP Provisioning tasks we were running locally against our tenant in September were failing, but as we were having ISP issues at the time we thought the server issues were being caused by our connection. As you both mention, running them again fixed the issue so at the time we were unable to diagnose this properly.
Around the same time in September we were updating some of our Azure Runbooks that were also using the PnP Provisioning Engine/Automation modules and they didn't seem to be failing - but the modules and engine fell over is they were ran locally. Again same intermittent issue so we added a retry into the Runbook and this seemed ok going forward.
What alerted us to this incident recently was a similar solution we'd built for a client, who suddenly was unable to create any new sites - we then figured it was the same 500 error we had briefly seen a couple of months ago, but this time it was failing on every run.
So it would seem this is something that was possibly affected a while back by other changes, but then got really bad a few days ago due to another change?
Anyway, for the first time in a few days, I just ran a successful provisioning task locally using PowerShell, all seems fine. Just need to check on the Azure Automation side for any issues. Will report back if I still see any issues (here and to support).
Looks like it is more over the whole o365 platform, then only the sharepoint, yesterday there were 2 incidents in the health center. Today only the one for office 365, but the problem is much wider then before
Update - unfortunately when I run the PnP Provisioning via Azure Runbook in my client's tenant this is still failing. Presume that this is an ongoing thing so will check again later today.
Potential fix has been applied few hours ago, so we are curious on hearing the status also using this channel. Is the situation any better starting from now or not for your environments?
Thank you for the status updates advance.
No change here.
had this morning first time a get pnpprovisioningtemplate and apply successful, but second run was still error
thx @SchauDK - let's follow up the situation for upcoming hours. It also might be that the fix has not yet been properly applied to your tenant, but it should be in progress. We are getting good messages from some customers, so looking positive for now.
Seems to be working at least in one of our case with rather heavy API usage.
@VesaJuvonen - You mentioned that it might take some time before this fix will be applied to all tenants. Is there any option to check if this fix already applied to a specific tenant? Seams to work in one of my tenants but for example not in any dev tenants
@ChrisOMetz - unfortunately there's no way to check from tenant level if the fix is already applied. It should be worldwide deployed/enabled within next 4 hours which should remove then unnecessary 500's... You can still have exceptions, like 429's or 503's which are throttling related, but if your code is CSOM and using ExecuteQueryRetry method, it will automatically handle these situations.
Seems fine for us now π
In general, everything seems to be working for us. I still see some 500 errors, but our code was handling those situations anyway. I'll give it couple of more hours and retest.
Started running some load tests on several 'internal' tenants, so far so good! Thanks for sharing the early status @VesaJuvonen, we will proceed with tests also on customer tenants.
Am seeing
Exception System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host
in my Azure Runbooks (which are running PnP PowerShell scripts.
Any chance this is the same issue? Should I be logging this separately to the issue already known?
@AndyBolam - does not look like a same issue in this case.
Am seeing
Exception System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host
in my Azure Runbooks (which are running PnP PowerShell scripts.
Any chance this is the same issue? Should I be logging this separately to the issue already known?
Wierd but I also have been seeing this issue on our tenant intermittently
thx @SandeepVo - This issue is around the 500 exception on the CSOM APIs, so let's not combine multiple different issues on single item. thx.
I guess the fix isn't globally deployed yet. I'll check again later.
It's better than yesterday, But we still encounter the error in about 50% of out provisioning trials.
For our dev and prod tenant is seems it works now. 3 provision requests within 1 hour and all has been processed successfully.
In our tenant we believe that it is fixed. No 500 error since noon. Jupiiii.... Thank you @VesaJuvonen we can celebrate the fix in ESPC 2019 in Prague in 2 weeks π
Looks to be resolved for us, thank you!
@VesaJuvonen Good to hear itβs fixed but I wonder how a serious issue like this can run in production for over a week and Microsoft not noticing it without community feedback. Major lack of monitoring on the API?
@frnk01 - Just to be clear here. Microsoft engineering did acknowledge and detected this issue in the background already, but as we started to have reports using multiple different social media channels and other forums, we wanted to also have public and open communications around this within this issue to provide more transparency and visibility on the progress of the issue.
This transparency helps all sides on the discussion and we also wanted to encourage people to use the normal support channels to report the issue as that's also the preferred option for any future issue. Obviously works was already being done on the background to address the root cause.
Thanks everyone for your input around this issue and we do apologize the inconvenience potentially caused by this for you. As the root cause for this particular issue has been now addressed, will be closing this issue from here. We are working internally at Microsoft to minimize possibility of similar issues in the future.
If you have any other issues which seem similar, please do open a new issue in here and open a Premier Support case, where suitable.
@VesaJuvonen As this is considered solved, does that mean that we shouldn't see the error at all or should we see less? I'm asking because we're still seeing it and I'm wondering how it will look on Monday when all our customers will start hitting SharePoint again.
So it looks it was resolved last Saturday, in the evening i run some local scripts from different tenants and some azure function scripts. So seems to be over.
It's not over! We have 53 occurrences within the last hour.
1 of 4 provisionings still showed this error this morning.
Category
[X] Bug [ ] Enhancement
Environment
[X] Office 365 / SharePoint Online [ ] SharePoint 2016 [ ] SharePoint 2013
Expected or Desired Behavior
Template is applied without errors (or at least with detailed errors)
Observed Behavior
We get an Intermittant 500 server error on applying a template with the following tracelog:
Steps to Reproduce
execute the following PowerShell:
Apply-PnPProvisioningTemplate -Path "C:\temp\SiteTemplate(1).xml" -Verbose