Open liab25 opened 3 years ago
throttling in theory returns a 429. So i don't know what's this....
Yea that's what i thought too. Problem appears to be with the trying to pull too many requests from Microsoft's API. I found one of their KB articles which references the error and says we need to adjust the $select and/or $top values. I looked at the source code and the URL in the error and it seems to be setting $top to 999. Im wondering if there's a way I can change this value.
According to this KB: https://docs.microsoft.com/en-us/graph/api/user-list-messages?view=graph-rest-1.0&tabs=http. But doesn't make sense to me, i thought the "batch" parameter was already doing paging so we only retrieve X amount of results until the limit is reached.
Use limit param
Use limit param
So that's the problem. If i set limit to something like 500 and batch to 100, my understanding is it will only retrieve 500 emails and page them with 100 emails each until the limit is reached. I need to be able to retrieve all messages in the inbox no matter the size/amount of messages it contains. I've set the limit to None and batch to 100 and even as low as 20 but I am still getting the same error.
It's only when i set a limit to something other than None that it works. Is this just a MS limitation?
For me it works if I set limit to none. Maybe you hit some internal limit or something. I don’t know. Ms graph is pretty obscure sometimes
Yeah that’s what I’m thinking...I was able to process inboxes with up to 15k emails at one point without issue now boxes with 1-2k are causing problems. Probably an MS thing like you said
I am actually having this exact same issue while retrieving the MS Graph Calendar. I have been recently getting one of the two following errors, and it is changes randomly:
HTTPSConnectionPool(host='graph.microsoft.com', port=443): Max retries exceeded with url:
OR
HTTPSConnectionPool(host='graph.microsoft.com', port=443): Max retries exceeded with url:
I am pulling the data using:
q = calendar.new_query('start').greater_equal(self.six_am)
graph_events = calendar.get_events(query=q, include_recurring=False, limit=None)
I know the 429 error is due to too many requests (I believe they allow 17 per second). The weird thing is I can run it once and it fails, but if I run it again one or two more times it eventually will work. This query usually only returns about 500 items so it is not large.
By the way, I do love this project. Thank you for everything.
Any tips @janscas ?
@pythonista092920 Thanks! I have no clue on this... I'm sorry.
It is alright, it was worth a try @janscas
In connection.py, do you see any problem changing within the init of class Connection to try changing around the request_retries and request_delays to see if that could help?
Try with different values like request_retries=None
to disable the retries. Also requests_delay
will help you avoid the 429 error.
Defaults are retries=3 and delay=200 miliseconds.
I'll try running it today with request_retries=None
and requests_delay=500
and see what happens. Thank you @janscas
Unfortunately that did not work, still getting 503 errors. I am going to build some extra exception handling around calendar.get_events
internally and see if I can get some better results. If I run it a second time, it usually works. Maybe the MS graph servers are just getting too much traffic to handle? Thanks for your response @janscas
Great to know @pythonista092920
Thanks
@pythonista092920 did your error get resolved?
@arkadas19
I actually still would have issues periodically. What I ended up doing was wrapping the calendar.get_events() method in a try except block and setting a 10 retry limit. Everything has ran perfect since making these changes. Maybe this code will help give you an idea.
`
try:
schedule = graph_account.schedule(resource=email)
calendar = schedule.get_default_calendar()
calender_events_not_retrieved = True
loop_attempt = 0
retry_attempt = 0
while calender_events_not_retrieved:
loop_attempt += 1
if loop_attempt > 10:
print("The loop has ran 10 times, quitting the program.")
quit()
if retry_attempt < 9:
try:
graph_events = calendar.get_events(include_recurring=False, limit=None)
calender_events_not_retrieved = False
except Exception as calender_not_retrieved_exc:
retry_attempt += 1
print(f"Hit calender_not_retrieved_exc: {calender_not_retrieved_exc}")
print(f"Could not pull calendar data from MS Graph, trying again. "
f"Starting reattempt {retry_attempt}")
time.sleep(60)
else:
print("Too many retry attempts, quitting the program. MS Graph servers appear to be not "
"accepting the requests.")
quit()
except Exception as get_calendar_data_exception:
print(f"Hit the following exception: {get_calendar_data_exception}")
print("Successfully pulled the calendar data")
`
I get the following error when pulling emails from an inbox.
HTTPSConnectionPool(host='graph.microsoft.com', port=443): Max retries exceeded with url: /v1.0/users/myemail@mydomain.com/mailFolders/Inbox/messages?%24top=999&%24expand=attachments%28%24select%3Dname%29 (Caused by ResponseError('too many 504 error responses'))
If i use the following code to limit the number of emails to process, it seems to work:
messages = inbox.get_messages(limit=20, batch=20, query=q)
But when I change to this, it will idle for a bit then spit out the error above:messages = inbox.get_messages(limit=None, batch=200, query=q)
Any way around this? Am i just hitting some MS Graph throttling limit?