Hello,
When querying large data sets containing a Date column, the text to date
conversion from as.Date() can sometimes be the largest speed bottleneck in the
query. See:
https://stat.ethz.ch/pipermail/r-devel/2013-August/067141.html
Consider using as.Date on unique values of the date only. For example, this
patch to PostgreSQLSupport.R reduced a 130 second query that selection 4
million rows from a table containing a Date column, to only 30 seconds:
diff PostgreSQLSupport.R PostgreSQLSupport.new
412a413,419
> as.Date2 <- function(x) { ## Significantly faster conversion
with duplicated dates
> if (anyDuplicated(x)) {
> ux <- unique(x)
> idx <- match(x, ux)
> return(as.Date(ux)[idx])
> } else as.Date(x)
> }
433c440
< rel[,i] <- as.Date(rel[,i])
---
> rel[,i] <- as.Date2(rel[,i])
Alternatively, consider parsing the character string directly without making a
system call, as in, for example the fasttime::fastPOSIXct function. See
https://stat.ethz.ch/pipermail/r-devel/2013-August/067142.html
Another possibility, perhaps it would be more efficient to have Date data sent
as an integer (days from epoch) rather than a character string.
--Robert
Original issue reported on code.google.com by rmcge...@gmail.com on 16 Aug 2013 at 8:39
Original issue reported on code.google.com by
rmcge...@gmail.com
on 16 Aug 2013 at 8:39