SharePoint Online responds with HTTP 400 bad request from time to time since April 16, 2021

EinmalIM commented 3 years ago

What type of issue is this?

Question

What SharePoint development model, framework, SDK or API is this about?

SharePoint CSOM

Target SharePoint environment

SharePoint Online

What browser(s) / client(s) have you tested

[ ] 💥 Internet Explorer
[ ] 💥 Microsoft Edge
[ ] 💥 Google Chrome
[ ] 💥 FireFox
[ ] 💥 Safari
[ ] mobile (iOS/iPadOS)
[ ] mobile (Android)
[X] not applicable
[ ] other (enter in the "Additional environment details" area below)

Additional environment details

CSOM from .net Framework 4.7.2 app running in an App Service and Web Job in Microsoft Azure, with Microsoft.SharePointOnline.CSOM nuget package 16.1.20720.12000

Issue description

We are using CSOM (.Net Framework 4.7.2 Microsoft.SharePointOnline.CSOM 16.1.20720.12000) to communicate with SharePoint Online.

Since April 16th, 2021 we recognize http 400 responses from different parts of our code using CSOM calls from time to time. We added status 400 to our internal retry logic. Executing the same call a few seconds later succeeds. So pretty much a wrong status code.

It seems like SharePoint should generate a 429 throttling response at that time and somehow generates an incorrect 400.

Does anybody else experience the same?

Regards Sven

ghost commented 3 years ago

Thank you for reporting this issue. We will be triaging your incoming issue as soon as possible.

patrikhellgren commented 3 years ago

I am also seeing this but I cannot confirm for how long it has been occurring, I know for sure at least the last couple of days.

EinmalIM commented 3 years ago

HTTP 400 is still happening for us, but with our retry logic as a work around we can keep our software running and our customers happy.

Still thinking, that the SharePoint team should have a look, how to fix that

andrewconnell commented 3 years ago

What exactly are you doing in the call... like what type of action? Sharing code would help...

ghost commented 3 years ago

The more context details you can provide, the easier it is to help assist on issues. Any code you can provide and/or screenshots of the issue also help. The easier you can make it to reproduce the issue, the easier and quicker it is for someone to help you. Please refer to How to Create Good Issues, specifically How to Create Good Issues: Include context, in our wiki for more details.

EinmalIM commented 3 years ago

We have an xml based script language which can execute CSOM calls. Scripts are running either as a reaction of a remote event receiver call or from a scheduler. The remote event receiver endpoint is running as an App Service in Azure on .Net 4.7. The Scheduler is a WebJob as part of that App Service.

So the concrete CSOM calls do vary.

We offer actions with our scripting language to read and write data in site collections. As stated above, we do not create http requests on our own but use the Microsoft CSOM nuget package. When a call fails with http 400, we just wait 8 seconds and on the second try the same call succeeds. So it takes a few calls, before we see http 400, which makes us think, that SharePoint wants to throttle us.

We are connecting to the sites with SharePoint app only authentication.

We access several SharePoint sites from different customers and see the 400 status for all of them, from time to time - so you can imagine the trouble we had, when those scripts failed to execute for all customers and random points, starting from 16th of April.

Luckily we had retry code in place and could work around by adding 400 to the status codes triggering the retry.

patrikhellgren commented 3 years ago

We have a webjob running synchronization of a sql table to terms in a term set which has been running for different customers for several years and I have been running some tests today.

Each term sync makes 5 ExecuteQuery and when syncing 10000 terms which takes about 3 hours I see the 400 status 3 (three) times. So 50000 ExecuteQuery during 3 hours makes about 4.5 requests per second.

We are using PnP SItes Core for this so that handles all regular throttling but does not handle this 400 status so I have also implemented some retry logic to handle this status. For all of the 3 calls returning 400 it only took one retry after waiting 500 ms and it was successful. The 400 status is returned for different calls each time.

EinmalIM commented 3 years ago

Do you have logs of the last days to verify if the 400 status codes started for you around 16th of April as well?

patrikhellgren commented 3 years ago

Sorry but we don't save logs that long for this sync. Also probably no one have noticed this since there are not many changes to be synced each day for anyone to miss. I noticed this myself a couple of days ago since I was setting up a new customer and doing a large new synchronization. Since then I have been testing a lot like I said in the previous comment and seeing this error on and off.

patrikhellgren commented 3 years ago

@andrewconnell Please see my linked issue above from PnP Framework for some simple code examples on how to reproduce this issue. As I say there it can take some time to see the (400) Bad Request but sooner or later, there it is.

EinmalIM commented 3 years ago

Hi, I wrote some test code which justs creates a list item and updates its title and body field in a loop and after a while the 400 status code comes up. Like so:

`
bool keepRunning = true; using (var ctx = new ClientContext(url)) { // set credentials...

                    Web web = ctx.Web;
                    ListCollection lists = web.Lists;

                    string listName = "testList";
                    string itemName = "some name";

                    List list = lists.GetByTitle(listName);
                    ListItemCreationInformation lci = new ListItemCreationInformation();

                    ListItem item = list.AddItem(lci);
                    item["Title"] = itemName;
                    item["Body"] = $"Created: {(DateTime.Now.ToString("yyyy-MM-dd HH:mm.ss"))}";
                    item.Update();
                    ctx.ExecuteQuery();

                    int iterations = 0;
                    while (keepRunning)
                    {
                        ++iterations;
                        try
                        {
                            item["Title"] = $"{itemName} #{iterations}";
                            item["Body"] = $"Updated: {DateTime.Now:yyyy-MM-dd HH:mm.ss)}\r\nIterations: {iterations}\r\n";
                            item.Update();
                            ctx.ExecuteQuery();
                        }
                        catch (Exception ex)
                        { ... }
                    }
            }

`

So, it does not seem to matter so much, which CSOM methods are called, just that you execute a number of CSOM requests in a short time, which would normally lead to throttling.

Would be nice if someone closer to the SharePoint dev team could try to clarify if there could be an issue in SharePoint Online, causing these 400 bad request responses.

MirkoApi95 commented 3 years ago

Hello, I too with a tool developed to perform migrations I get this problem for a few days, the behavior is random, but mostly when I set 3-4 lookup columns and do an Update, sometimes I also have it on the Author or Editor setting or ensureUser. I make a lot of requests, even I have buffered with the repeater with incremental wait, but it slows down as I have it every 2 minutes or so...

enriccarrion commented 3 years ago

Hey guys!

Same here, many of our tenants suffer from the same HTTP 400 errors since approx. 16/19 April and currently still occurring.

We leveraged the retry mechanism for throttling to cover HTTP 400 in addition to 429. It seems to work but... is this actually the right way?

Let me add more info to this. Since those dates, apart from the HTTP 400 errors with CSOM, we've also started to experience many other errors in these same tenants, in many different situations:

Several times loading a SharePoint site collection in the browser (even before authenticating) shows an ASP.NET error.
OAuth negotiation for SharePoint add-ins returns 403 many times.
SharePoint Online PowerShell CMDlets are returning frequent "Unknown Error"'s:
- When retrieving site collections (Get-SPOSite)
- When adding users (Add-SPOUser)

All this happens very frequently but none of it is reproducible with a specific set of steps. It happens sometimes (and when it happens, it seems to get stuck on it) and sometimes it doesn't.

Did MS do any SPO updates during those dates? Could someone provide any other insights?

Thank you!!

jansenbe commented 3 years ago

@MirkoApi95 / @enriccarrion : please create (or have your customers do) support cases for these issues

enriccarrion commented 3 years ago

Thanks, Bert! We did it yesterday.

PeterHeibrink commented 3 years ago

Anyone receive an update on this? We're having the issues with Azure Functions using the PnP Provisioning Framework on multiple customer tenants. We did create multiple tickets already, but Microsoft Support seems not convinced there might be something unstable in their API's and it seems our tickets are not yet being investigated. I hope someone else already has a ticket that is being worked on. Issue is already 3 weeks going, so if anybody has an update, I love to hear it!

ificator commented 3 years ago

Hey folks, we did managed to track down one "random 400" issue and disabled the change causing the issue on 5/7 ~14:10 (PST). Hopefully this has resolved your issues, but please let us know if you continue to see unexpected 400s.

EinmalIM commented 3 years ago

Looks promising - I checked 4 tenants and http 400 seems to have stopped around 20:00 UTC on 5/7.

patrikhellgren commented 3 years ago

I can not either reproduce the 400 Bad Request error anymore so that seems like it has been fixed. However the issue pnp/pnpframework#292, that looked like it could be related since it occurs under the same circumstances, still persists. I can reproduce that repeatedly today.

ghost commented 3 years ago

Issues that have been closed & had no follow-up activity for at least 7 days are automatically locked. Please refer to our wiki for more details, including how to remediate this action if you feel this was done prematurely or in error: Issue List: Our approach to locked issues

SharePoint / sp-dev-docs