WinVector / rquery

Data Wrangling and Query Generating Operators for R. Distributed under choice of GPL-2 or GPL-3 license.
https://winvector.github.io/rquery/
Other
109 stars 15 forks source link

Error trying to convert character column to date in Postgres DB #10

Closed NompuK closed 5 years ago

NompuK commented 5 years ago

I am getting an error when trying to convert a character column to date format with extend() in Postgres DB.

Consider a table like this one:

library(rquery)
library(wrapr)
library(RPostgres)

raw_connection <- dbConnect(drv = RPostgres::Postgres(),
                            host = ######,
                            dbname = #####,
                            user = #####,
                            password = ######)

dbopts <- rq_connection_tests(raw_connection)
db <- rquery_db_info(connection = raw_connection,
                     is_dbi = TRUE,
                     connection_options = dbopts)

df <- read.table(text="
 id   date
 'a'     '2019-01-12'
 'b'     '2019-02-21'
 'c'     '2019-02-11'
", header=TRUE, stringsAsFactors = F)

rq_copy_to(db, "testdate",
           df, overwrite = T)

db_testdate <- db_td(db, "testdate")

db_testdate %.>%
  extend(. , date := as.Date(date)) %.>%
  materialize(db, .)

But:

Error in result_create(conn@ptr, statement) : Failed to prepare query: FEHLER: Syntaxfehler bei »as« LINE 3: as.Date ( "date" ) AS "date"

I have tried several alternatives using extend_se()´ and quotationextend_nse()and.()` notation and so much else.

Still I'm not sure if it really is an error or if I'm just doing it wrong or if this functionality is not given/intended at all. I wasn't able to find any documentation on this problem, so I'm trying GitHub again.

Thx!

JohnMount commented 5 years ago

Unlike dplyr, rquery does not depend on a hybrid-eval and also does not have translations for as many R-functions. So one has to use the Postgres date conversion which is to_date(COLUMN, FORMAT).

library(rquery)
library(wrapr)
library(RPostgres)

raw_connection <- DBI::dbConnect(RPostgres::Postgres(),
                                  host = 'localhost',
                                  port = 5432,
                                  user = 'johnmount',
                                  password = '')

dbopts <- rq_connection_tests(raw_connection)
db <- rquery_db_info(connection = raw_connection,
                     is_dbi = TRUE,
                     connection_options = dbopts)

df <- wrapr::build_frame(
  "id"  , "date"       |
    "a" , "2019-01-12" |
    "b" , "2019-02-21" |
    "c" , "2019-02-11" )

db_testdate <- rq_copy_to(db, "testdate",
           df, overwrite = TRUE)

db_result <- db_testdate %.>%
  extend(. , date := to_date(date, "YYYY-MM-DD")) %.>%
  materialize(db, .)

DBI::dbReadTable(db$connection, db_result$table_name) %.>%
  str(.)
#> 'data.frame':    3 obs. of  2 variables:
#>  $ id  : chr  "a" "b" "c"
#>  $ date: Date, format: "2019-01-12" "2019-02-21" ...

DBI::dbDisconnect(raw_connection)
NompuK commented 5 years ago

Very good! Thanks again.

One question:

Is there a list to be found, which R functions are translated?

JohnMount commented 5 years ago

I haven't worked out a list yet, as very few functions are translated (I think mean -> avg is the main one). The conversions are intended to be per-database and user supplied (hence exposing the mapping structures in the database adaptors). dbplyr has a lot of these translations, but rquery does not.

JohnMount commented 5 years ago

I am now maintaining a list of the translations here. Thanks for your questions and patience.

NompuK commented 5 years ago

That's perfect. And helps me understand what rquery does.

Allow me to ask one more questions in regard to this: How can I do a DISTINCT(column1, column2, column3)-operation?

When I put it in select_rows() it ends up in the WHERE-statement and creats an error of course. In select_columns is definitly wrong. And in project() as well as sql_node() I don't know how to assign the result of one function to several columns.

JohnMount commented 5 years ago

rquery is based more on nested SQL than on high-power SQL operators (and there is no rquery notation that assigns to multiple columns). To get the effect you want we would use a project as follows.

library("rqdatatable")
iris %.>% project(., groupby = c("Petal.Length", "Petal.Width", "Species"), one = 1)

The assignment is currently needed as rquery doesn't yet support not creating a new column in a project (I'll add an issue on that). Then one would join this table or use it some other way.