dogsheep / github-to-sqlite

Save data from GitHub to a SQLite database
https://github-to-sqlite.dogsheep.net/
Apache License 2.0
402 stars 43 forks source link

Create index on issue_comments(user) and other foreign keys #35

Closed simonw closed 4 years ago

simonw commented 4 years ago
create index issue_comments_user on issue_comments(user)

I'm sure there are other user columns that could benefit from an index.

simonw commented 4 years ago

It sped up this query a lot - 2.5s down to 300ms:

select
  repos.full_name,
  json_object(
    'href', 'https://github.com/' || repos.full_name || '/issues/' || issues.number,
    'label', '#' || issues.number
  ) as issue,
  issues.title,
  users.login,
  users.id,
  issues.state,
  issues.locked,
  issues.assignee,
  issues.milestone,
  issues.comments,
  issues.created_at,
  issues.updated_at,
  issues.closed_at,
  issues.author_association,
  issues.pull_request,
  issues.repo,
  issues.type
from
  issues
  join repos on repos.id = issues.repo
  join users on issues.user = users.id
where
  issues.state = 'open'
  and issues.user not in (9599, 27856297)
  and not exists (
    select
      id
    from
      issue_comments
    where
      issue_comments.user = 9599
      and issues.id = issue_comments.issue
  )
order by
  issues.updated_at desc;
simonw commented 4 years ago

This sped that query up even more - down to 4ms.

create index issue_comments_issue on issue_comments(issue);
simonw commented 4 years ago

Easiest option: use db.index_foreign_keys(): https://sqlite-utils.readthedocs.io/en/stable/python-api.html#adding-indexes-for-all-foreign-keys