dogsheep / github-to-sqlite

Save data from GitHub to a SQLite database
https://github-to-sqlite.dogsheep.net/
Apache License 2.0
405 stars 43 forks source link

Command to fetch stargazers for one or more repos #4

Closed simonw closed 4 years ago

simonw commented 5 years ago

Maybe this:

$ github-to-sqlite stargazers github.db simonw/datasette

It could accept more than one repos.

Maybe have options similar to --sql in twitter-to-sqlite so you can e.g. fetch all stargazers for all of the repos you have fetched into the database already (or all of the repos belonging to owner X)

simonw commented 5 years ago

Paginate through https://api.github.com/repos/simonw/datasette/stargazers

Send Accept: application/vnd.github.v3.star+json to get the starred_at dates.

simonw commented 4 years ago

Here's a query I figured out using a window function that shows cumulative stargazers over time:

select
  yyyymmdd,
  sum(n) over (
    order by
      yyyymmdd rows unbounded preceding
  ) as cumulative_count
from
  (
    select
      substr(starred_at, 0, 11) as yyyymmdd,
      count(*) as n
    from
      stars
    group by
      yyyymmdd
  )
simonw commented 4 years ago

I could add that window function query as a view, but only if the detected version of SQLite supports window functions.

simonw commented 4 years ago

I'm not going to do the --sql bit just yet. I have patterns for working around this for other commands which are working fine:

https://github.com/dogsheep/github-to-sqlite/blob/d00a53061556dc403c166b443d141c4e1adbd64a/.github/workflows/deploy-demo.yml#L53-L70

simonw commented 4 years ago

Alternative pattern:

sqlite-utils releases.db 'select full_name from repos' --csv --no-headers \
  | tr -d '\r' \
  | xargs github-to-sqlite stargazers stars.db
simonw commented 4 years ago

Needs tests and documentation. I shipped it early to check that the live demo works.

simonw commented 4 years ago

Datasette cumulative stars over time: https://github-to-sqlite.dogsheep.net/github?sql=select%0D%0A++yyyymmdd%2C%0D%0A++sum%28n%29+over+%28%0D%0A++++order+by%0D%0A++++++yyyymmdd+rows+unbounded+preceding%0D%0A++%29+as+cumulative_count%0D%0Afrom%0D%0A++%28%0D%0A++++select%0D%0A++++++substr%28starred_at%2C+0%2C+11%29+as+yyyymmdd%2C%0D%0A++++++count%28*%29+as+n%0D%0A++++from%0D%0A++++++stars%0D%0A++++where+repo+%3D+107914493%0D%0A++++group+by%0D%0A++++++yyyymmdd%0D%0A++%29

simonw commented 4 years ago

Graph of cumulative stars for Datasette over time: https://github-to-sqlite.dogsheep.net/github?sql=select%0D%0A++yyyymmdd%2C%0D%0A++sum%28n%29+over+%28%0D%0A++++order+by%0D%0A++++++yyyymmdd+rows+unbounded+preceding%0D%0A++%29+as+cumulative_count%0D%0Afrom%0D%0A++%28%0D%0A++++select%0D%0A++++++substr%28starred_at%2C+0%2C+11%29+as+yyyymmdd%2C%0D%0A++++++count%28*%29+as+n%0D%0A++++from%0D%0A++++++stars%0D%0A++++where+repo+%3D+107914493%0D%0A++++group+by%0D%0A++++++yyyymmdd%0D%0A++%29#g.mark=line&g.x_column=yyyymmdd&g.x_type=temporal&g.y_column=cumulative_count&g.y_type=quantitative

github__select_yyyymmdd__sum_n__over___order_by_yyyymmdd_rows_unbounded_preceding___as_cumulative_count_from___select_substr_starred_at__0__11__as_yyyymmdd__count____as_n_from_stars_where_repo___107914493_group_by_yyyymmdd__

Stars per day (as a label bar chart, so very wide):

https://github-to-sqlite.dogsheep.net/github?sql=%0D%0A++++select%0D%0A++++++substr%28starred_at%2C+0%2C+11%29+as+yyyymmdd%2C%0D%0A++++++count%28*%29+as+n%0D%0A++++from%0D%0A++++++stars%0D%0A++++where+repo+%3D+107914493%0D%0A++++group+by%0D%0A++++++yyyymmdd%0D%0A++#g.mark=bar&g.x_column=yyyymmdd&g.x_type=ordinal&g.y_column=n&g.y_type=quantitative

github__select_substr_starred_at__0__11__as_yyyymmdd__count____as_n_from_stars_where_repo___107914493_group_by_yyyymmdd