Cidree / rpostgis

rpostgis: R Interface to a 'PostGIS' Database
http://cidree.github.io/rpostgis/
77 stars 14 forks source link

Feature request: Allow multiple columns in dbAddKey #10

Closed dpprdan closed 6 years ago

dpprdan commented 6 years ago

At the moment, dbAddKey seems to only be able to set a primary key on a single column, but not on multiple columns (e.g. longitudinal table with id and year). Would it be possible to add this feature?

basille commented 6 years ago

Seems like a good idea. I don't use multiple columns for key so much, so let's make sure I don't miss the point. Would it be enough to allow, for both colname and reference, either a single character string (unique column, for instance "colA") or a character vector (multiple columns, for instance, c("colA", "colB") that would combine columns colA and colB)? Are there other elements I should consider? Can I rely on PostgreSQL to check for the fact that it makes sense in the database?

dpprdan commented 6 years ago

I think this should suffice. This is how I am doing this at the moment.

suppressPackageStartupMessages(library(RPostgreSQL))
suppressPackageStartupMessages(library(dplyr)) 
suppressPackageStartupMessages(library(glue))

con <- 
  dbConnect(
    PostgreSQL(),
    user = "my_user_name",
    password = "my_password",
    host = "localhost",
    dbname = "postgres"
  )

colnames(iris) <- gsub("[.]", "_", tolower(colnames(iris)))
iris <- iris %>% mutate(id = row_number())

DBI::dbWriteTable(con, "iris", iris, row.names = FALSE)
#> [1] TRUE

# add primary key
tbl <- "iris"
pk_clmns <- c("id", "sepal_width")
(pk_sql <- 
  glue_sql(
    "ALTER TABLE {`tbl`} ADD PRIMARY KEY ({`pk_clmns`*})", 
    # "*" collapes and quotes values appropriately
    .con = con
  ))
#> <SQL> ALTER TABLE "iris" ADD PRIMARY KEY ("id", "sepal_width")
DBI::dbExecute(con, pk_sql) 
#> [1] 0

Note that glue_sql is only available in the github version of glue, not on CRAN yet.

Can I rely on PostgreSQL to check for the fact that it makes sense in the database?

Honestly, I don't know. All I can say is that I do get error messages with the above method if I misspell a column name or try to add a primary key when there is already one.

While we are at it, maybe it makes sense to add a multicolumn index to dbIndex as well? (PostgreSQL docs on Multicolumn Indexes and Combining Multicolumn Indexes).

# add index
idx_name <- "sepal_width_idx"
idx_clmns <- c("sepal_width", "sepal_length")
(idx_sql <- glue_sql("
                CREATE INDEX {`idx_name`} ON {`tbl`} ({`idx_clmns`*})
                ", .con = con))
#> <SQL> CREATE INDEX "sepal_width_idx" ON "iris" ("sepal_width", "sepal_length")
DBI::dbExecute(con, idx_sql)
#> [1] 0

DBI::dbDisconnect(con)
#> [1] TRUE
basille commented 6 years ago

Sorry for the rather long handling time. I have implemented both features (keys and indexes on multiples columns) in the multi-column-key-index branch. Would you mind checking it out, and tell me if this works for you? I'm closing this issue in the meanwhile. Please feel free to reopen it if you find bugs in the implementation (it's not been thoroughly tested).