Open danielballan opened 1 week ago
Does SQLite actually have a JSON type or just functions for handling JSON strings/blobs? I see the latter in the documentation but not the former
https://sqlite.org/datatype3.html https://sqlite.org/json1.html
Unless I am overlooking something I think these would need to be stored as strings (?)
Right. Evidently it does have a distinct internal data structure (JSONB) which is uses for some functionality but the data would be stored as a string.
I think the main issue here is that the infrastructure for serializing an arbitrary list to JSON doesn't currently exist (in ADBC or nanoarrow). We probably don't need a JSON library to write simple things (e.g., lists of integers), but we do need to be sure that bit is well-tested (and that we want that scope to exist in ADBC).
Hmm, lists of integers or other simple things should be fine (though, JSON doesn't have integers!) but I'd worry that could scope creep into half a JSON writer. I wonder if we need to start considering some sort of plugin mechanism so we can optionally ship bits like this? (Alternatively, is there a JSON library light enough that we could embed it and not worry about the dependency?)
Actually, I wonder SQLite has the functionality we need in its C API. Does json_array
solve the serialization problem?
Alternatively, is there a JSON library light enough that we could embed it and not worry about the dependency?
I have never had problems with nlohmann_json in nanoarrow (https://github.com/nlohmann/json), although we might not need it provided we can use to_chars()
for locale-independent floating-point-to-string conversion (strings require escaping and binary require some special handling: https://github.com/apache/arrow-nanoarrow/blob/7c1fb36029a92ee039d2235be07c70d8726b2a7f/src/nanoarrow/testing/testing.cc#L54-L69 )
Does json_array solve the serialization problem?
I suppose we could generate a query with parameters like SELECT json_array(?, ?, ?, ?)
, bind the values, and grab the result? Unless json_array()
is in the C API?
Yeah, I think it'd have to be a query. I glanced around and nothing seems to be in the header.
Ah, got it. :+1:
What feature or improvement would you like to see?
In https://github.com/apache/arrow-adbc/issues/2066 @skarakuzu and I outlined a use case for streaming LIST data into PostgreSQL and SQLite via ADBC. Support for PostgreSQL was recently added. (Thanks, @paleolimbot!)
Would it be justifiable to add LIST support for SQLite as well? Our use case is embedded scientific data collection scenarios, where it is not feasible to deploy PostgreSQL but we'd like a compatible-as-possible data model. In #2066 had floated an idea of using a SQLite JSON or JSONB column for this, as SQLite is generally loose about types and lacks any array/list type.
We would be happy to try to contribute this if that is feasible.