duckdb / duckdb-r

The duckdb R package
https://r.duckdb.org/
Other
105 stars 21 forks source link

add `temporary` parameter to `duckdb_read_csv()` #142

Open ThomasSoeiro opened 2 months ago

ThomasSoeiro commented 2 months ago

It is currently not possible to write a temporary table using duckdb_read_csv() directly.

To fix this, can we expose the temporary parameter in duckdb_read_csv():

https://github.com/duckdb/duckdb-r/blob/b13d4050e3096c38a699719a2b5e89fc65106765/R/csv.R#L32-L33

and pass it to the call to dbWriteTable()?

https://github.com/duckdb/duckdb-r/blob/b13d4050e3096c38a699719a2b5e89fc65106765/R/csv.R#L70


For tables that fit in memory, we can use:

mytable <- read.csv2("mytable.csv")
dbWriteTable(con, "mytable", mytable, temporary = TRUE)

But we are stuck for tables that don't fit in memory.

krlmlr commented 2 months ago

Thanks.

ThomasSoeiro commented 2 months ago
Prepare files and db for testing ```r library(DBI) write.csv(warpbreaks, "warpbreaks.csv", row.names = FALSE) write.csv(cars, "cars.csv", row.names = FALSE) con <- dbConnect(duckdb::duckdb(), dbdir = "test.duckdb") duckdb::duckdb_read_csv(con, "warpbreaks1", "warpbreaks.csv") dbListTables(con) # [1] "warpbreaks1" dbDisconnect(con, shutdown = TRUE) ```
  • Can you confirm that running dbExecute("COPY ... FROM ...") does what you expect it to for temporary tables? (...)

Yes, it does:

con <- dbConnect(duckdb::duckdb(), dbdir = "test.duckdb", read_only = TRUE)
dbListTables(con)
# [1] "warpbreaks1"
dbExecute(con, "CREATE TEMPORARY TABLE warpbreaks2 (breaks NUMERIC, wool VARCHAR, tension VARCHAR);")
# [1] 0
dbExecute(con, "COPY warpbreaks2 FROM 'warpbreaks.csv' (HEADER);")
# [1] 54
dbListTables(con)
# [1] "warpbreaks1" "warpbreaks2"
dbDisconnect(con, shutdown = TRUE)

# check that the temporary table is gone
con <- dbConnect(duckdb::duckdb(), dbdir = "test.duckdb")
dbListTables(con)
# [1] "warpbreaks1"
dbDisconnect(con, shutdown = TRUE)
  • (...) What happens with name clashes (same table exists as permanent and as temporary table)?

Not sure what is expected here. Here is a test:

con <- dbConnect(duckdb::duckdb(), dbdir = "test.duckdb", read_only = TRUE)
dbListTables(con)
# [1] "warpbreaks1"
dbExecute(con, "CREATE TEMPORARY TABLE warpbreaks1 (speed NUMERIC, dist NUMERIC);")
# [1] 0
dbExecute(con, "COPY warpbreaks1 FROM 'cars.csv' (HEADER);")
# [1] 50
dbListTables(con)
# [1] "warpbreaks1" "warpbreaks1"
dbReadTable(con, "warpbreaks1") |> head()
#   speed dist
# 1     4    2
# 2     4   10
# 3     7    4
# 4     7   22
# 5     8   16
# 6     9   10
dbDisconnect(con, shutdown = TRUE)
  • For now, you could work around:

    • Import the CSV data into a temporary database that you connect to your main database with ATTACH
    • Use something like CREATE TEMPORARY TABLE ... AS read_csv_auto('...') (not tested)

It works, thanks! (documentation here)

con <- dbConnect(duckdb::duckdb(), dbdir = "test.duckdb", read_only = TRUE)
dbListTables(con)
# [1] "warpbreaks1"
dbExecute(con, "CREATE TEMPORARY TABLE warpbreaks3 AS FROM 'warpbreaks.csv';")
# [1] 54
dbListTables(con)
# [1] "warpbreaks1" "warpbreaks3"
dbDisconnect(con, shutdown = TRUE)