Open simonw opened 9 months ago
datasette-build
CLI command or, if you have Datasette installed in the same environment, also a datasette build
subcommand.tsv
and .csv
and .json
, but further plugins can add support for things like .parquet
or .xls
or .geojson
etc.id
will be assumed to be a primary keyusers_id
will be assumed to be foreign keys to the accompanying users
tabletablename.config.json
that configure the primary key, related keys, if any columns should be FTS indexed etcThe MVP of this would let you run:
datasette-build demo.db tables/
Against a folder containing:
tables/cats.csv
tables/dogs.json
tables/places.tsv
And get a SQLite database with those 3 tables.
The first release of this will only support SQLite, but I'd like to keep the design general enough that supporting other databases in the future wouldn't require too much of a redesign.
I'll try to avoid SQLite-specific syntax and concepts where possible.
I can get a first version working using the sqlite-utils
https://sqlite-utils.datasette.io/en/stable/python-api.html#reading-rows-from-a-file and https://sqlite-utils.datasette.io/en/stable/python-api.html#detecting-column-types-using-typetracker mechanisms.
While the initial idea for this tool was for it to only work against directories, I wonder if it would be useful for this to work too?
datasette-bulid data.db *.csv *.json
Effectively allowing it to be called with specific paths to files (and to subdirectories) in order to avoid having to arrange everything into the correct directory structure first.
This would also make it a better replacement for csvs-to-sqlite
.
One catch: I think this would make the following ambiguous:
datasette-build data.db folder/
Is that saying "build a database from every file and subdirectory in this folder" or does it mean "Build a database with a single table called folder
"?
I can special case it so it does the "build a DB from everything" only if you pass it just a single argument that is a folder. Little bit confusing though to have it behave differently with one argument as opposed to two or more.
Potential solution:
datasette-build data.db folder/ --specific
The --specific
option (need a better name) means "create tables for each file I pass to you". If you pass more than one path then --specific
is assumed.
Alternatively, could have multiple commands.
datasette-build all data.db folder/ # Builds all
datasette-build data.db folder/ # Builds just a `folder` table
Or:
datasette-build data.db folder/ # Builds all, supports just two required arguments
datasette-build tables data.db folder/ # Builds just a `folder` table, can have multiple arguments
Is this confusing? It might work OK, I prefer the second option.
Got ChatGPT to generate test data for me, though it made a lot of mistakes along the way: https://chat.openai.com/share/3ba38b7d-4ed8-42ec-a158-c4de4ea1e817
It's going to be a CLI tool that can build a directory full of files into a SQLite database.