O365 / python-o365

A simple python library to interact with Microsoft Graph and Office 365 API
Apache License 2.0
1.6k stars 412 forks source link

HTTPSConnectionPool ERROR - Can we modify the $top value in query? #586

Open liab25 opened 3 years ago

liab25 commented 3 years ago

I get the following error when pulling emails from an inbox.

HTTPSConnectionPool(host='graph.microsoft.com', port=443): Max retries exceeded with url: /v1.0/users/myemail@mydomain.com/mailFolders/Inbox/messages?%24top=999&%24expand=attachments%28%24select%3Dname%29 (Caused by ResponseError('too many 504 error responses'))

If i use the following code to limit the number of emails to process, it seems to work: messages = inbox.get_messages(limit=20, batch=20, query=q) But when I change to this, it will idle for a bit then spit out the error above: messages = inbox.get_messages(limit=None, batch=200, query=q)

Any way around this? Am i just hitting some MS Graph throttling limit?

alejcas commented 3 years ago

throttling in theory returns a 429. So i don't know what's this....

liab25 commented 3 years ago

Yea that's what i thought too. Problem appears to be with the trying to pull too many requests from Microsoft's API. I found one of their KB articles which references the error and says we need to adjust the $select and/or $top values. I looked at the source code and the URL in the error and it seems to be setting $top to 999. Im wondering if there's a way I can change this value.

According to this KB: https://docs.microsoft.com/en-us/graph/api/user-list-messages?view=graph-rest-1.0&tabs=http. But doesn't make sense to me, i thought the "batch" parameter was already doing paging so we only retrieve X amount of results until the limit is reached.

alejcas commented 3 years ago

Use limit param

liab25 commented 3 years ago

Use limit param

So that's the problem. If i set limit to something like 500 and batch to 100, my understanding is it will only retrieve 500 emails and page them with 100 emails each until the limit is reached. I need to be able to retrieve all messages in the inbox no matter the size/amount of messages it contains. I've set the limit to None and batch to 100 and even as low as 20 but I am still getting the same error.

It's only when i set a limit to something other than None that it works. Is this just a MS limitation?

alejcas commented 3 years ago

For me it works if I set limit to none. Maybe you hit some internal limit or something. I don’t know. Ms graph is pretty obscure sometimes

liab25 commented 3 years ago

Yeah that’s what I’m thinking...I was able to process inboxes with up to 15k emails at one point without issue now boxes with 1-2k are causing problems. Probably an MS thing like you said

pythonista092920 commented 3 years ago

I am actually having this exact same issue while retrieving the MS Graph Calendar. I have been recently getting one of the two following errors, and it is changes randomly:

HTTPSConnectionPool(host='graph.microsoft.com', port=443): Max retries exceeded with url: (Caused by ResponseError('too many 503 error responses'))

OR

HTTPSConnectionPool(host='graph.microsoft.com', port=443): Max retries exceeded with url: (Caused by ResponseError('too many 429 error responses'))

I am pulling the data using:

q = calendar.new_query('start').greater_equal(self.six_am)
graph_events = calendar.get_events(query=q, include_recurring=False, limit=None)

I know the 429 error is due to too many requests (I believe they allow 17 per second). The weird thing is I can run it once and it fails, but if I run it again one or two more times it eventually will work. This query usually only returns about 500 items so it is not large.

By the way, I do love this project. Thank you for everything.

Any tips @janscas ?

alejcas commented 3 years ago

@pythonista092920 Thanks! I have no clue on this... I'm sorry.

pythonista092920 commented 3 years ago

It is alright, it was worth a try @janscas

In connection.py, do you see any problem changing within the init of class Connection to try changing around the request_retries and request_delays to see if that could help?

alejcas commented 3 years ago

Try with different values like request_retries=None to disable the retries. Also requests_delay will help you avoid the 429 error.

Defaults are retries=3 and delay=200 miliseconds.

pythonista092920 commented 3 years ago

I'll try running it today with request_retries=None and requests_delay=500 and see what happens. Thank you @janscas

pythonista092920 commented 3 years ago

Unfortunately that did not work, still getting 503 errors. I am going to build some extra exception handling around calendar.get_events internally and see if I can get some better results. If I run it a second time, it usually works. Maybe the MS graph servers are just getting too much traffic to handle? Thanks for your response @janscas

alejcas commented 3 years ago

Great to know @pythonista092920

Thanks

arkadas19 commented 3 years ago

@pythonista092920 did your error get resolved?

pythonista092920 commented 3 years ago

@arkadas19

I actually still would have issues periodically. What I ended up doing was wrapping the calendar.get_events() method in a try except block and setting a 10 retry limit. Everything has ran perfect since making these changes. Maybe this code will help give you an idea.

`

  try:
        schedule = graph_account.schedule(resource=email)
        calendar = schedule.get_default_calendar()

        calender_events_not_retrieved = True

        loop_attempt = 0
        retry_attempt = 0
        while calender_events_not_retrieved:
            loop_attempt += 1

            if loop_attempt > 10:
                print("The loop has ran 10 times, quitting the program.")
                quit()

            if retry_attempt < 9:
                try:
                    graph_events = calendar.get_events(include_recurring=False, limit=None)
                    calender_events_not_retrieved = False

                except Exception as calender_not_retrieved_exc:
                    retry_attempt += 1
                    print(f"Hit calender_not_retrieved_exc: {calender_not_retrieved_exc}")
                    print(f"Could not pull calendar data from MS Graph, trying again. "
                          f"Starting reattempt {retry_attempt}")
                    time.sleep(60)

            else:
                print("Too many retry attempts, quitting the program. MS Graph servers appear to be not "
                      "accepting the requests.")
                quit()

    except Exception as get_calendar_data_exception:
        print(f"Hit the following exception: {get_calendar_data_exception}")

    print("Successfully pulled the calendar data")      

`