Closed steffilazerte closed 5 years ago
Yeah, I think that would be better, I'm not sure why I wrote those. It's possibly for performance on really large datasets - lubridate has (had?) some pretty slow functions...
Hmm, that's a good point. To make sure, I ran some microbenchmarking, but it looks like they're comparable in terms of speed (now at least). For reference, the lakers data set as 34,624 rows. So not colossal, but not small either.
So I'll go ahead and make those changes.
library(microbenchmark)
library(lubridate)
lakers$date <- ymd(lakers$date)
microbenchmark("lubridate" = {year(lakers$date)},
"rcaaqs" = {as.integer(as.POSIXlt(lakers$date)$year + 1900)},
times = 1000)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# lubridate 2.796287 2.805971 3.523583 2.824979 2.892424 86.76146 1000 a
# rcaaqs 2.840310 2.851017 3.403420 2.870700 2.953551 84.87180 1000 a
microbenchmark("lubridate" = {month(lakers$date)},
"rcaaqs" = {as.integer(as.POSIXlt(lakers$date)$mon + 1)},
times = 1000)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# lubridate 2.799603 2.813001 3.443109 2.834986 2.913673 84.79691 1000 a
# rcaaqs 2.841006 2.851418 3.496467 2.872234 2.949152 85.42343 1000 a
microbenchmark("lubridate" = {day(lakers$date)},
"rcaaqs" = {as.integer(as.POSIXlt(lakers$date)$mday)},
times = 1000)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# lubridate 2.740563 2.747982 3.171940 2.767116 2.825111 84.94585 1000 a
# rcaaqs 2.737761 2.744776 3.358322 2.764812 2.839751 87.09555 1000 a
Perfect, thanks for checking! In that case I say go for it.
For simplicity, it may be better to replace some of the specialized date functions with their lubridate alternative. For example,
get_year_from_date()
could be replaced bylubridate::year()
get_month_from_date()
could be replaced bylubridate::month()
get_day_from_date()
could be replaced bylubridate::day()