DyfanJones / noctua

Connect R to Athena using paws SDK (DBI Interface)
https://dyfanjones.github.io/noctua/
Other
45 stars 5 forks source link

allow noctua to append to a static s3 directory #111

Closed DyfanJones closed 3 years ago

DyfanJones commented 3 years ago

Preview noctua would upload to static S3 location with the data.frame name i.e.

"s3://bucket/path/to/file/default/athena_tbl/athena_tbl.csv" This made it difficult to append to Athena tables that weren't partitioned.

This branch enables users to append to existing tables:

library(DBI)

con <- dbConnect(noctua::athena())

#### manually uploading data in chunks ######
part1 = chickwts[1:50,]

dbWriteTable(con, "chickwts", part1, file.type = "parquet", overwrite = T, compress = T)

dbGetQuery(con, "select count(*) from chickwts")
# 50

part2=chickwts[51:nrow(chickwts), ]

dbWriteTable(con, "chickwts", part2, file.type = "parquet", append = T, compress = T)

dbGetQuery(con, "select count(*) from chickwts")
# 71

#### uploading data in max.batch of 50 #####
dbWriteTable(con, "chickwts", chickwts, file.type = "parquet", overwrite = T, compress = T, max.batch = 50)

NOTE: Both methods achieve the same goal however max.batch will split the data.frame for the user automatically.