dogsheep / github-to-sqlite

Save data from GitHub to a SQLite database
https://github-to-sqlite.dogsheep.net/
Apache License 2.0
416 stars 43 forks source link

Fix the demo - it breaks because of the tags table change #45

Closed simonw closed 4 years ago

simonw commented 4 years ago

https://github.com/dogsheep/github-to-sqlite/runs/885773677

  File "/home/runner/work/github-to-sqlite/github-to-sqlite/github_to_sqlite/utils.py", line 476, in save_tags
    db["tags"].insert_all(
  File "/opt/hostedtoolcache/Python/3.8.3/x64/lib/python3.8/site-packages/sqlite_utils/db.py", line 1145, in insert_all
    result = self.db.conn.execute(query, params)
sqlite3.OperationalError: table tags has no column named repo

That's because I changed the name in #44. I thought this would be safe since no-one else could possibly be using this yet (it hadn't shipped in a release) but turns out I broke my demo!

simonw commented 4 years ago

I could fix this by putting REFRESH_DB in a commit message:

https://github.com/dogsheep/github-to-sqlite/blob/4ae4aa6f172344b19ff3513707195ee6d2654bd4/.github/workflows/deploy-demo.yml#L41-L46

But... doing so would lose the data I've collected in https://github-to-sqlite.dogsheep.net/github/dependents?_sort_desc=first_seen_utc concerning the first time each dependent repo was spotted.

simonw commented 4 years ago

I think the best fix is to download the github.db database, manually fix it and then manually deploy it to Cloud Run from my laptop.

simonw commented 4 years ago

Manually fixing the database:

$ wget 'https://github-to-sqlite.dogsheep.net/github.db'
--2020-07-18 15:52:33--  https://github-to-sqlite.dogsheep.net/github.db
Resolving github-to-sqlite.dogsheep.net (github-to-sqlite.dogsheep.net)... 172.217.5.115
Connecting to github-to-sqlite.dogsheep.net (github-to-sqlite.dogsheep.net)|172.217.5.115|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14626816 (14M) [application/octet-stream]
Saving to: ‘github.db’

github.db                              100%[============================================================================>]  13.95M  1.22MB/s    in 18s     

2020-07-18 15:52:53 (773 KB/s) - ‘github.db’ saved [14626816/14626816]

$ sqlite3 github.db 
SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
sqlite> drop table tags;
sqlite> ^D
$ github-to-sqlite tags github.db simonw/datasette simonw/sqlite-utils dogsheep/healthkit-to-sqlite dogsheep/swarm-to-sqlite dogsheep/twitter-to-sqlite dogsheep/inaturalist-to-sqlite dogsheep/google-takeout-to-sqlite dogsheep/github-to-sqlite dogsheep/genome-to-sqlite dogsheep/pocket-to-sqlite dogsheep/hacker-news-to-sqlite dogsheep/dogsheep-photos 
$ sqlite-utils tables github.db --counts
[{"table": "users", "count": 4048},
 {"table": "repos", "count": 210},
 ...
 {"table": "stars", "count": 4140},
 {"table": "tags", "count": 188}]
$ sqlite-utils rows github.db tags    
[{"repo": 107914493, "name": "0.45", "sha": "f1f581b7ffcd5d8f3ae6c1c654d813a6641410eb"},
 {"repo": 107914493, "name": "0.45a5", "sha": "676bb64c877d73f8ff496cef4632f5a8a5a9283c"},
 ...
simonw commented 4 years ago

Deploying the fixed version like this:

$ gcloud config set run/region us-central1
$ gcloud config set project datasette-222320
$ datasette publish cloudrun /tmp/github.db \
            -m demo-metadata.json \
            --service github-to-sqlite \
            --install=py-gfm \
            --install='datasette-search-all>=0.3' \
            --install='datasette-render-markdown>=1.1.2' \
            --install=datasette-pretty-json \
            --install=datasette-json-html \
            --install=datasette-vega
simonw commented 4 years ago

https://github-to-sqlite.dogsheep.net/github/tags now shows a repo column instead of a repo_id column.