Closed needmorecowbell closed 5 years ago
Good catch. There are two cases to consider here:
saved_state
. For Twitter and RSS, we can't feasibly fetch all results, so we just use whatever the "first page" looks like. We should do the same for GitHub search results. Increasing the per_page
number is a good idea.saved_state
, and we should do best effort to process every search result back until that state, even if its on a different page.Closed by #53
While the top of the request shows the number of entries, that is not how many are returned -- results are paginated. Adding the parameter per_page=100 sets the maximum return, and page= goes through all the results. At minimum we should be scraping the maximum of the page, however it's up for question whether we really want all the results if the query is vague