bcgov / rcaaqs

An R package to facilitate the calculation of air quality metrics according to Canadian Ambient Air Quality Standards
Apache License 2.0
16 stars 5 forks source link

Omit specialized date functions in favour of lubridate functions #38

Closed steffilazerte closed 5 years ago

steffilazerte commented 6 years ago

For simplicity, it may be better to replace some of the specialized date functions with their lubridate alternative. For example,

get_year_from_date() could be replaced by lubridate::year() get_month_from_date() could be replaced by lubridate::month() get_day_from_date() could be replaced by lubridate::day()

ateucher commented 6 years ago

Yeah, I think that would be better, I'm not sure why I wrote those. It's possibly for performance on really large datasets - lubridate has (had?) some pretty slow functions...

steffilazerte commented 6 years ago

Hmm, that's a good point. To make sure, I ran some microbenchmarking, but it looks like they're comparable in terms of speed (now at least). For reference, the lakers data set as 34,624 rows. So not colossal, but not small either.

So I'll go ahead and make those changes.

library(microbenchmark)
library(lubridate)

lakers$date <- ymd(lakers$date)

microbenchmark("lubridate" = {year(lakers$date)},
               "rcaaqs" = {as.integer(as.POSIXlt(lakers$date)$year + 1900)},
               times = 1000)

# Unit: milliseconds
#       expr      min       lq     mean   median       uq      max neval cld
#  lubridate 2.796287 2.805971 3.523583 2.824979 2.892424 86.76146  1000   a
#     rcaaqs 2.840310 2.851017 3.403420 2.870700 2.953551 84.87180  1000   a

microbenchmark("lubridate" = {month(lakers$date)},
               "rcaaqs" = {as.integer(as.POSIXlt(lakers$date)$mon + 1)},
                times = 1000)

# Unit: milliseconds
#       expr      min       lq     mean   median       uq      max neval cld
#  lubridate 2.799603 2.813001 3.443109 2.834986 2.913673 84.79691  1000   a
#     rcaaqs 2.841006 2.851418 3.496467 2.872234 2.949152 85.42343  1000   a

 microbenchmark("lubridate" = {day(lakers$date)},
                "rcaaqs" = {as.integer(as.POSIXlt(lakers$date)$mday)},
                times = 1000)

# Unit: milliseconds
#       expr      min       lq     mean   median       uq      max neval cld
#  lubridate 2.740563 2.747982 3.171940 2.767116 2.825111 84.94585  1000   a
#     rcaaqs 2.737761 2.744776 3.358322 2.764812 2.839751 87.09555  1000   a
ateucher commented 6 years ago

Perfect, thanks for checking! In that case I say go for it.