gabledata / recap

Work with your web service, database, and streaming schemas in a single format.
https://recap.build
MIT License
334 stars 24 forks source link

Support SQLite converter into Recap #418

Closed mjperrone closed 10 months ago

mjperrone commented 10 months ago

SQLite schematab docs:

Every SQLite database contains a single "schema table" that stores the schema for that database. The schema for a database is a description of all of the other tables, indexes, triggers, and views that are contained within the database. The schema table looks like this:

CREATE TABLE sqlite_schema(
  type text,
  name text,
  tbl_name text,
  rootpage integer,
  sql text
);

The sql field is the SQL DDL that would create the table.

To implement this, one would create a recap SQLite client to grab that DDL and then create a recap SQLite converter to turn it into recap types.

criccomini commented 10 months ago

Oh, gosh, I can't believe I didn't think of adding this! 🤣 Seems so obvious now that you opened the issue. 😝

mjperrone commented 10 months ago

Using the DDL to represent the schema is an interesting (lazy?) choice for SQLite. But thinking about it a bit more... if we implement that for SQLite, we might be able to reuse that code for pulling schema from other databases assuming the DDL dumps are compatible. Potentially replace postgresql and mysql client/converter code using that.

criccomini commented 10 months ago

Perhaps sqlglot can be of use?

criccomini commented 10 months ago

assuming the DDL dumps are compatible

I think this is the key part of your statement. I suspect information_schema will be more robust. Perhaps start with SQLite impl only, and see where that leads...

criccomini commented 10 months ago

Poked around a bit. Looks like there are a few options:

https://www.sqlitetutorial.net/sqlite-describe-table/

The pragma option looks pretty parsable.

criccomini commented 10 months ago

@mjperrone do y'all want to take this or should I? I don't mind doing it, but I don't want to start if y'all want to take it for learning experience or whatever. =-)

mjperrone commented 10 months ago

We aren't in a rush to implement this one, so I dont think we will get to it before you do if you're interested in it

criccomini commented 10 months ago

Nice! I've been hankering to write a bit of code, so I'll take a crack at this.

criccomini commented 10 months ago

Starting to poke at this now.

criccomini commented 10 months ago

@mjperrone been doing some digging on SQLite data types.

  1. SQLite supports two modes: STRICT or default. Tables created without STRICT use type affinities, which are recommendations that SQLite ignores.
  2. sqlite2 and sqlite3 differ
  3. In non-STRICT tables, everything in () is ignored (e.g. INTEGER(123) is just INTEGER to SQLite). The parenthesis are still stored in the pragma description, so it's accessible.
  4. For non-STRICT tables, SQLite uses some kind of wacky affinity matching rules.

Here's what I recommend:

  1. Make the converter work with STRICT tables only.
  2. Make the converter work with sqlite3.

This is a very straight-forward to implement. Is this OK with you?

Alternatively, we could add support for both STRICT and non-STRICT tables. This gets a lot messier. Dates, for example, are a mess in SQLite; it accepts both strings or ints in the column. 😢

WDYT? What is your use case here? Do you have any practical examples of SQLite schemas you're dealing with?

/cc @adrianisk

criccomini commented 10 months ago

Some more details on affinity matching:

https://medium.com/@SullivanArielle/type-affinity-5936cee17c35

Note that there are two things happening here:

  1. The type of the column
  2. Coercing values

We're only interested in column types (1). As such, after further reading, I think the converter can handle both affinity and STRICT types. I'll include a flag similar to the PG converter's enforce_array_dimensions flag to specify whether the converter should return STRICT RecapTypes (e.g. all integers get treated as IntType(bits=64)) or affinity types (e.g. a TINYINT would return a BoolType (or maybe IntType(bits=1, variable=False)).

I also plan on implementing scale/precision support and char octet length. They'll only be used when use_affinity_rules==True.

mjperrone commented 10 months ago

@criccomini I'll try to get some more details for you.

criccomini commented 10 months ago

@mjperrone I have a PR up here:

https://github.com/recap-build/recap/pull/424

It's still WIP for tests, but I want feedback. I'm going to add CLI tests for recap schema and recap ls and do some minor tweaks.

mjperrone commented 10 months ago

I may be able to give this a look on Monday