dogsheep / github-to-sqlite

Save data from GitHub to a SQLite database
https://github-to-sqlite.dogsheep.net/
Apache License 2.0
402 stars 43 forks source link

[Feature Request] Support Repo Name in Search 🥺 #38

Closed zzeleznick closed 4 years ago

zzeleznick commented 4 years ago

Description

Per your v2.2 release tweet I played with the demo, but the output did not match my expectations.

Expected Behavior

Expected a search query for "twitter" contained within the repo column to return non-zero results.

Actual Behavior

😭 0 rows where repo contains "twitter" sorted by starred_at descending

Best Explanation

Per the table schema (see appendix) repo is of type INTEGER which built from repo_id and does not expose the repo name in search.

Desired Behavior

Given that searching for "206156866" is less intuitive than "twitter", it would be great to support this via extending the search capabilities or by adding an additional column.

✅ 104 rows where repo contains "twitter" ❌ 104 rows where repo contains "206156866" sorted by starred_at descending

Appendix

CREATE TABLE [stars] (
   [user] INTEGER REFERENCES [users]([id]),
   [repo] INTEGER REFERENCES [repos]([id]),
   [starred_at] TEXT,
   PRIMARY KEY ([user], [repo])
);
CREATE INDEX [idx_stars_repo]
                ON [stars] ([repo]);
CREATE INDEX [idx_stars_user]
                ON [stars] ([user]);
simonw commented 4 years ago

This is one of the use-cases for the repos_starred view: it allows you to easily run this kid of query without having to construct the SQL by hand. Here's a demo:

https://github-to-sqlite.dogsheep.net/github/repos_starred?name__contains=twitter

My philosophy here is to keep the raw tables (like stars) as normalized as possible, then use SQL views which expose the data in a form that's easier to query.

zzeleznick commented 4 years ago

Thanks, @simonw!

I feel a little foolish in hindsight, but I'm on the same page now and am glad to have discovered first-hand a motivation for this repos_starred use case.

simonw commented 4 years ago

No this is really useful feedback! I'm so close to this project that I miss what's not obvious to people dropping in for the first time.

zzeleznick commented 4 years ago
  1. More than glad to share feedback from the sidelines as a starrer.
-- Motivation:
--  Datasette is a data hammer and I'm looking for nails
--  e.g. Find which repos a user has starred => trigger a TBD downstream action
select
  starred_at,
  starred_by,
  full_name as repo_name
from
  repos_starred
where
  starred_by = "zzeleznick"
order by
  starred_at desc
starred_at starred_by repo_name
2020-02-11T01:08:59Z zzeleznick dogsheep/twitter-to-sqlite
2020-01-11T21:57:34Z zzeleznick simonw/datasette
  1. In my day job, I use airflow, and that's the mental model I'm bringing to datasette.

  2. I see your project like twitter-to-sqlite akin to Operators in Airflow world.