DyfanJones / RAthena

Connect R to Athena using Boto3 SDK (DBI Interface)
https://dyfanjones.github.io/RAthena/
Other
35 stars 6 forks source link

dbRemoveTable support removing s3 files thanks to @OssiLehtinen in #48 #49

Closed DyfanJones closed 4 years ago

DyfanJones commented 4 years ago

if delete_data is set to TRUE then this shouldn't be an issue? As it has the prompt to protect user. I will have to just assess if this will have an impact on current functionality :)

OssiLehtinen commented 4 years ago

Should there be the option to set delete_data = T (no_confirm = T), when overwriting a table when writing a new table? That is, pass those flags to dbRemoveTable via Athena_write_table from dbWriteTable and copy_to.

OssiLehtinen commented 4 years ago

This line has one missing '"/"'

message(paste0("Info: The following S3 objects will be deleted:\n", paste0(paste0("s3://", s3_path$bucket, "/", all_keys), collapse="\n")))

DyfanJones commented 4 years ago

I believe this would be a simpler method for dbRemoveTable

setMethod(
  "dbRemoveTable", c("AthenaConnection", "character"),
  function(conn, name, delete_data = FALSE, no_confirm = FALSE, ...) {
  if (!dbIsValid(conn)) {stop("Connection already closed.", call. = FALSE)}

  if (grepl("\\.", name)) {
    dbms.name <- gsub("\\..*", "" , name)
    Table <- gsub(".*\\.", "" , name)
  } else {
    dbms.name <- conn@info$dbms.name
    Table <- name
  }

  if (delete_data) {
    glue <- conn@ptr$client("glue")
    s3 <- conn@ptr$resource("s3")

    tryCatch(
      s3_path <- split_s3_uri(
        glue$get_table(DatabaseName = dbms.name,
                       Name = Table)$Table$StorageDescriptor$Location),
      error = function(e) py_error(e))

    message(paste0("Info: The S3 objects in prefix will be deleted:\n",
                   paste0("s3://", s3_path$bucket, "/", s3_path$key)))

    if (!no_confirm) {
      confirm <- readline(prompt = "Delete files (y/n)?: ")
      if (confirm != "y") {
        message("Info: Table deletion aborted.")
        return(NULL)}
    }

    # Remove objects in prefix of AWS Athena table
    s3$Bucket(s3_path$bucket)$objects$filter(Prefix = paste0(s3_path$key, "/"))$delete()
  }

  res <- dbExecute(conn, paste("DROP TABLE ", paste(dbms.name, Table, sep = "."), ";"))
  dbClearResult(res)
  if (!delete_data) message("Info: Only Athena table has been removed.")
  invisible(TRUE)
})

Only problem is that delete s3 objects by prefix can on be done in boto3 I don't think paws offers this feature however the initial proposed solution can be easily integrated into noctua

OssiLehtinen commented 4 years ago

Looks great! A for loop looks always like a kludge :)

DyfanJones commented 4 years ago