dogsheep / github-to-sqlite

Save data from GitHub to a SQLite database
https://github-to-sqlite.dogsheep.net/
Apache License 2.0
405 stars 43 forks source link

Assets table with downloads #15

Closed garethr closed 4 years ago

garethr commented 4 years ago

The releases command extracts the releases table, but data about the individual assets are locked up in the JSON document in the assets field. My main interest is in individual and aggregate download counts. I was wondering if creating a new table with a record per asset may be useful? If so I'm happy to send a PR when I get a moment. Do you have opinions about that simply being part of the releases command or would you prefer a separate command as well?

simonw commented 4 years ago

Splitting assets out into a separate table totally makes sense to me. They can still be fetched as part of the releases command.

simonw commented 4 years ago

None of my own releases use assets (they are all pushed to PyPI instead) but I spotted that your project here uses assets, so I'll test against that: https://github.com/instrumenta/conftest/releases/tag/v0.18.0

github-to-sqlite releases releases.db instrumenta/conftest
simonw commented 4 years ago

Each asset looks like this:

    {
        "url": "https://api.github.com/repos/instrumenta/conftest/releases/assets/11811946",
        "id": 11811946,
        "node_id": "MDEyOlJlbGVhc2VBc3NldDExODExOTQ2",
        "name": "checksums.txt",
        "label": "",
        "uploader": {
            "login": "garethr",
            "id": 2029,
            "node_id": "MDQ6VXNlcjIwMjk=",
            "avatar_url": "https://avatars2.githubusercontent.com/u/2029?v=4",
            "gravatar_id": "",
            "url": "https://api.github.com/users/garethr",
            "html_url": "https://github.com/garethr",
            "followers_url": "https://api.github.com/users/garethr/followers",
            "following_url": "https://api.github.com/users/garethr/following{/other_user}",
            "gists_url": "https://api.github.com/users/garethr/gists{/gist_id}",
            "starred_url": "https://api.github.com/users/garethr/starred{/owner}{/repo}",
            "subscriptions_url": "https://api.github.com/users/garethr/subscriptions",
            "organizations_url": "https://api.github.com/users/garethr/orgs",
            "repos_url": "https://api.github.com/users/garethr/repos",
            "events_url": "https://api.github.com/users/garethr/events{/privacy}",
            "received_events_url": "https://api.github.com/users/garethr/received_events",
            "type": "User",
            "site_admin": false
        },
        "content_type": "text/plain; charset=utf-8",
        "state": "uploaded",
        "size": 600,
        "download_count": 2,
        "created_at": "2019-03-30T16:56:44Z",
        "updated_at": "2019-03-30T16:56:44Z",
        "browser_download_url": "https://github.com/instrumenta/conftest/releases/download/v0.1.0/checksums.txt"
    }
garethr commented 4 years ago

That looks great, thanks!