dogsheep / github-to-sqlite

Save data from GitHub to a SQLite database
https://github-to-sqlite.dogsheep.net/
Apache License 2.0
405 stars 43 forks source link

Commits in GitHub API can have null author #18

Closed simonw closed 4 years ago

simonw commented 4 years ago
Traceback (most recent call last):
  File "/home/ubuntu/datasette-venv/bin/github-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/home/ubuntu/datasette-venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/datasette-venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/datasette-venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ubuntu/datasette-venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/datasette-venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/datasette-venv/lib/python3.6/site-packages/github_to_sqlite/cli.py", line 235, in commits
    utils.save_commits(db, commits, repo_full["id"])
  File "/home/ubuntu/datasette-venv/lib/python3.6/site-packages/github_to_sqlite/utils.py", line 290, in save_commits
    commit_to_insert["author"] = save_user(db, commit["author"])
  File "/home/ubuntu/datasette-venv/lib/python3.6/site-packages/github_to_sqlite/utils.py", line 54, in save_user
    for key, value in user.items()
AttributeError: 'NoneType' object has no attribute 'items'

Got this running the commits command from cron.

simonw commented 4 years ago

This suggests that commit["author"] can be None in some cases?

simonw commented 4 years ago

https://github.community/t5/GitHub-API-Development-and/Request-for-commits-quot-author-quot-null-and-quot-committer/m-p/35842/highlight/true#M3372

Commits aren't always associated with a GitHub user. For example, perhaps a friend of mine and I were working on a project together. I have a GitHub account and my friend doesn't. If we both add commits to the repository using our own email addresses and names and then I push the repository to GitHub, my commits will be associated with my GitHub user account but my friends' commits will show up with author and committer as null.

simonw commented 4 years ago

I need to find an example before I work on this.

simonw commented 4 years ago

Found one: https://api.github.com/repos/simonw/simonw.github.com/commits

simonw commented 4 years ago
[
  {
    "sha": "a8dc914089d399d9b522ebb51b67f9ac2e8aa6b0",
    "node_id": "MDY6Q29tbWl0OTMyMDk6YThkYzkxNDA4OWQzOTlkOWI1MjJlYmI1MWI2N2Y5YWMyZThhYTZiMA==",
    "commit": {
      "author": {
        "name": "Simon Willison",
        "email": "simon@...",
        "date": "2008-12-18T23:17:12Z"
      },
      "committer": {
        "name": "Simon Willison",
        "email": "simon@...",
        "date": "2008-12-18T23:17:12Z"
      },
      "message": "First commit",
      "tree": {
        "sha": "ac2dfb75e2592c59165c2880f3f7a16dafd452a1",
        "url": "https://api.github.com/repos/simonw/simonw.github.com/git/trees/ac2dfb75e2592c59165c2880f3f7a16dafd452a1"
      },
      "url": "https://api.github.com/repos/simonw/simonw.github.com/git/commits/a8dc914089d399d9b522ebb51b67f9ac2e8aa6b0",
      "comment_count": 0,
      "verification": {
        "verified": false,
        "reason": "unsigned",
        "signature": null,
        "payload": null
      }
    },
    "url": "https://api.github.com/repos/simonw/simonw.github.com/commits/a8dc914089d399d9b522ebb51b67f9ac2e8aa6b0",
    "html_url": "https://github.com/simonw/simonw.github.com/commit/a8dc914089d399d9b522ebb51b67f9ac2e8aa6b0",
    "comments_url": "https://api.github.com/repos/simonw/simonw.github.com/commits/a8dc914089d399d9b522ebb51b67f9ac2e8aa6b0/comments",
    "author": null,
    "committer": null,
    "parents": []
  }
]
simonw commented 4 years ago

So it turns out "author" and "committer" on the commit are null if the email address in the nested "commit" doesn't match an existing GitHub user.

Maybe I should be storing the nested data somewhere as well?

simonw commented 4 years ago

I could pull a pk-hashed version of the name/email into separate raw_author and raw_committer columns perhaps - against a commit_authors table. Could be interesting.

simonw commented 4 years ago

I implemented the raw_authors idea.