datasette / datasette-build

Build a directory full of files into a SQLite database
Apache License 2.0
12 stars 0 forks source link

Initial tool design #1

Open simonw opened 9 months ago

simonw commented 9 months ago

It's going to be a CLI tool that can build a directory full of files into a SQLite database.

simonw commented 9 months ago
simonw commented 9 months ago

The MVP of this would let you run:

datasette-build demo.db tables/

Against a folder containing:

tables/cats.csv
tables/dogs.json
tables/places.tsv

And get a SQLite database with those 3 tables.

simonw commented 9 months ago

The first release of this will only support SQLite, but I'd like to keep the design general enough that supporting other databases in the future wouldn't require too much of a redesign.

I'll try to avoid SQLite-specific syntax and concepts where possible.

simonw commented 9 months ago

I can get a first version working using the sqlite-utils https://sqlite-utils.datasette.io/en/stable/python-api.html#reading-rows-from-a-file and https://sqlite-utils.datasette.io/en/stable/python-api.html#detecting-column-types-using-typetracker mechanisms.

simonw commented 9 months ago

While the initial idea for this tool was for it to only work against directories, I wonder if it would be useful for this to work too?

datasette-bulid data.db *.csv *.json

Effectively allowing it to be called with specific paths to files (and to subdirectories) in order to avoid having to arrange everything into the correct directory structure first.

This would also make it a better replacement for csvs-to-sqlite.

One catch: I think this would make the following ambiguous:

datasette-build data.db folder/

Is that saying "build a database from every file and subdirectory in this folder" or does it mean "Build a database with a single table called folder"?

I can special case it so it does the "build a DB from everything" only if you pass it just a single argument that is a folder. Little bit confusing though to have it behave differently with one argument as opposed to two or more.

simonw commented 9 months ago

Potential solution:

datasette-build data.db folder/ --specific

The --specific option (need a better name) means "create tables for each file I pass to you". If you pass more than one path then --specific is assumed.

simonw commented 9 months ago

Alternatively, could have multiple commands.

datasette-build all data.db folder/ # Builds all
datasette-build data.db folder/ # Builds just a `folder` table

Or:

datasette-build data.db folder/ # Builds all, supports just two required arguments
datasette-build tables data.db folder/ # Builds just a `folder` table, can have multiple arguments

Is this confusing? It might work OK, I prefer the second option.

simonw commented 9 months ago

Got ChatGPT to generate test data for me, though it made a lot of mistakes along the way: https://chat.openai.com/share/3ba38b7d-4ed8-42ec-a158-c4de4ea1e817