Open cldellow opened 9 months ago
If anyone else is interested in this, I've started republishing the QRank dataset as an SQLite artifact weekly. It can be fetched from https://github.com/hikeratlas/qrank/releases/download/latest/qrank.db
Hope that's OK - if not, let me know and I'll happily take it down.
Hello, thank you for making qrank! It saves a lot of labour and pipeline fiddling.
Would it be possible to also publish qrank as a sqlite DB in addition to a CSV file?
The SQLite format has been stable since 2004. IMO it can serve a similar role as CSV, but with the benefit of having fast random-access lookups when the table is indexed.
The SQLite library is widely available on modern OSes, with bindings for all popular programming languages. Some environments, such as Python and Bun, bundle SQLite so that it can be immediately used without installing any additional packages.
If the DB is created with the full Wikidata ID (e.g., the string
Q12345
), a SQLite DB is approx 1.1GB - approx 3 times the size as the CSV.If created with the ID converted to a number (e.g., the number
12345
), a SQLite DB is approx 350MB -- about the same size as the CSV.Creating the DB is straight-forward, e.g. this bash script:
I understand if this is out of scope for the project -- but if you're open to serving an extra file, I suspect it'd eliminate one more step from many people's pipelines.