Stephen-McDaniel / rpostgresql

This repository is an export of the final version from the retired Google Code system (code.google.com/p/rpostgresql).
0 stars 0 forks source link

Faster Date Conversion #56

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hello,
When querying large data sets containing a Date column, the text to date 
conversion from as.Date() can sometimes be the largest speed bottleneck in the 
query. See: 
https://stat.ethz.ch/pipermail/r-devel/2013-August/067141.html

Consider using as.Date on unique values of the date only. For example, this 
patch to PostgreSQLSupport.R reduced a 130 second query that selection 4 
million rows from a table containing a Date column, to only 30 seconds:

diff PostgreSQLSupport.R PostgreSQLSupport.new
412a413,419
>     as.Date2 <- function(x) {            ## Significantly faster conversion 
with duplicated dates
>         if (anyDuplicated(x)) {
>             ux <- unique(x)
>             idx <- match(x, ux)
>             return(as.Date(ux)[idx])
>         } else as.Date(x)
>     }
433c440
<             rel[,i] <- as.Date(rel[,i])

---
>             rel[,i] <- as.Date2(rel[,i])

Alternatively, consider parsing the character string directly without making a 
system call, as in, for example the fasttime::fastPOSIXct function. See 
https://stat.ethz.ch/pipermail/r-devel/2013-August/067142.html

Another possibility, perhaps it would be more efficient to have Date data sent 
as an integer (days from epoch) rather than a character string.
--Robert

Original issue reported on code.google.com by rmcge...@gmail.com on 16 Aug 2013 at 8:39