blueprints-for-text-analytics-python / blueprints-text

Jupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"
Apache License 2.0
248 stars 139 forks source link

Chapter 2 Pagination #8

Closed LulinS closed 2 years ago

LulinS commented 2 years ago

For the get_all_pages function, when I set the 'since' parameter as '2020-07-01T10:00:01Z' as in the book, the response status code is 502. After changing it to a recent date, like 2022-02-01, the function works well.

sidhusmart commented 2 years ago

Thanks for identifying this issue and I'm also able to replicate it. I think it might be due to API limits being breached when more data is extracted but I will check this and provide more information.

sidhusmart commented 2 years ago

I've tried to test multiple scenarios and debug the issue. I can confirm that the Github API returns a 502 error when using a since parameter that is further back in the past while it returns a valid 200 response if a more recent date is used. It's not an issue with hitting the rate limits but probably related to how Github servers retrieve (or are unable to retrieve) older issue comments. I will continue to investigate but the suggested workaround, for now, would be to modify the since date to something in the current year as suggested.

sidhusmart commented 2 years ago

The error is returned from the Github API when retrieving a large number of comments. The recommended solution is to choose a date closer to the current date so that number of retrieved comments is manageable for the API.