github / github-artifact-exporter

A set of packages to make exporting artifacts from GitHub easier
MIT License
279 stars 31 forks source link

Adding `--since` Option for `repo:pulls` Command To Limit Downloads #52

Closed schlagelk closed 3 years ago

schlagelk commented 3 years ago

Is it possible to add the --since option to the repo:pulls command (just like issues and commits)? I would like to do some analysis on PR data in a given large repo, but fetching every single PR ever made takes a long time (also would be nice to incrementally download in the future, instead of re-downloading everything).

Thanks!

Chocrates commented 3 years ago

Hey @schlagelk , I will have to check on this but I don't expect it to be too difficult. Thanks for the issue!

Chocrates commented 3 years ago

Bad news, the GraphQL PR connection does not have a since or until option, so we can't filter them at the server level. https://docs.github.com/en/graphql/reference/objects#pullrequestconnection

schlagelk commented 3 years ago

Bad news, the GraphQL PR connection does not have a since or until option, so we can't filter them at the server level. https://docs.github.com/en/graphql/reference/objects#pullrequestconnection

Bummer - I appreciate the look!

Chocrates commented 3 years ago

How much data are you filtering through? We already have an outstanding issue with the search api (used for issues) which is likely going to require us to duplicate the github search feature. https://github.com/github/github-artifact-exporter/discussions/45

Doing the same for PR's may be an option. At the moment the idea was to pull all data down and filter it locally, duplicating the GitHub.com search features. Knowing your use case may help refine that idea, since we already know its not great.

schlagelk commented 3 years ago

How much data are you filtering through? We already have an outstanding issue with the search api (used for issues) which is likely going to require us to duplicate the github search feature. #45

Doing the same for PR's may be an option. At the moment the idea was to pull all data down and filter it locally, duplicating the GitHub.com search features. Knowing your use case may help refine that idea, since we already know its not great.

Internal repo at my place of work - we're looking backwards about a year for several of our platforms and doing some analysis on PR open/close/review/merge times (and other things). I would venture to say that we have several dozen new PRs for each repo per week, and right now we are going to look at 3-5 repos. However it's likely that this tool will need to also analyze other repos soon which puts the number of repos up to around 10, with several dozen PRs per repo, going back over a year. Does that help? Or would you need something more concrete than that?