apache / arrow-adbc

Database connectivity API standard and libraries for Apache Arrow
https://arrow.apache.org/adbc/
Apache License 2.0
360 stars 86 forks source link

r/adbcdrivermanager: round-tripping zero-length factor returns wrong type #1400

Open nbenn opened 8 months ago

nbenn commented 8 months ago

Using SQLite, we currently have

library(adbcdrivermanager)

db <- adbc_database_init(adbcsqlite::adbcsqlite(), uri = ":memory:")
con <- adbc_connection_init(db)

write_adbc(iris[0L, ], con, "iris0")
write_adbc(iris, con, "iris")

str(as.data.frame(read_adbc(con, "SELECT Species from iris")))
#> 'data.frame':    150 obs. of  5 variables:
#>  $ Species: chr  "setosa" "setosa" "setosa" "setosa" ...
str(as.data.frame(read_adbc(con, "SELECT Species from iris0")))
#> 'data.frame':    0 obs. of  5 variables:
#>  $ Species: num
paleolimbot commented 8 months ago

I think that's happening because the SQLite driver gets zero values and doesn't know what type to infer. I find it a little strange that the type it infers by default is a SQLite integer as opposed to a SQLite null, but I get what happened.

At some point there was talk of using sqlite_decltype() to parse out declared field type if it exists (i.e., avoid attempting to guess for the simple case where we already know the declared type), which would solve this particular example.

lidavidm commented 8 months ago

https://github.com/apache/arrow-adbc/blob/ec68c9316f545a74b0942c7cd3ca7730a141ee3c/c/driver/sqlite/statement_reader.c#L740-L745

It always starts on INT64

krlmlr commented 6 months ago

Related: #1591?