EdwinTh / padr

Padding of missing records in time series
https://edwinth.github.io/padr/
Other
132 stars 12 forks source link

improve thicken performance for very large sets #27

Closed EdwinTh closed 6 years ago

EdwinTh commented 7 years ago

thicken can still be slowish when the number of rows goes above ~million rows. Look to RcppParallel for a solution.

Look also in the difference between thickening Dates instead of POSIX, the former is a lot slower. This is not what I would expect.

doug-friedman commented 7 years ago

If you're open to adding package imports, you might want to check out fasttime to speed up the process of conversion to POSIX (or use one of the fast date parsers in lubridate).

http://stackoverflow.com/questions/35247063/is-there-a-fast-parser-for-date

EdwinTh commented 7 years ago

Thanks for informing. My resistance finally broke last week, so it is totally an option. Will check it out.

doug-friedman commented 7 years ago

No worries! I don't mean to tempt you towards more package imports!

EdwinTh commented 7 years ago

Roadmap: replace the spanning and looping in C++ by POSIXlt object manipulation

EdwinTh commented 7 years ago

Profiling showed that the problem is not so much in the looping in the C++ code, but mainly in the get_interval. This is now a separate issue, since more depends on this. It is removed from thicken, because it was not strictly necessary there.

EdwinTh commented 6 years ago

get_interval is no longer used in thicken because it was no longer strictly necessary.