github / github-artifact-exporter

A set of packages to make exporting artifacts from GitHub easier
MIT License
279 stars 31 forks source link

Export more than the first 1000 issues #42

Open RyanCavanaugh opened 3 years ago

RyanCavanaugh commented 3 years ago

I ran this on microsoft/TypeScript and was surprised to see only a 1.6 MB file produced, since I know we have much more content than that. The generated JSON, though, only includes 1,000 issues, so it misses the other ~95% of the issues to have crossed our repo.

Chocrates commented 3 years ago

Thanks for the report @RyanCavanaugh ! That seems bad! I will dig in to this and see what is going on.

Chocrates commented 3 years ago

I am able to repro this: ./bin/run search:issues --token <token> --owner microsoft --repo typescript --format JSON --since=2020-06-01 > typescript.json

Chocrates commented 3 years ago

Looks like the GraphQL query gets to page 10 and then can't find the next page (hasNextPage is false). Debugging this to see what is happening.

RyanCavanaugh commented 3 years ago

The Search API is documented to only return the first 1,000 results

I've written a similar tool for bulk export and AFAIK the only alternative is to go through the Issues graph instead of the Search graph.

Chocrates commented 3 years ago

I opened #45 to discuss how we want to solve this issue, since it will require a bit of re-architecting.