Closed BBieri closed 2 years ago
Great, thank you Bernhard! And yes, you are right, we should find out ways of speeding up this process.
However, just FYI, in messydates how dates are actually resolved is not slow at all, these are often very simple functions. What slows down the resolve process is the fact that date ranges and sets, as well as other imprecise or ambiguous dates, are expanded individually before being resolved (the resolve functions call expand() beforehand). You can see that if you "profile" the resolve functions (see the profvis package for profiling for example).
Hi Henrique, thanks for the quick reply and the clarifications ! I've just assigned you this issue as well since you have the most experience with the code but I am still here in case you need any help. Let me know :)
Could expansion be accelerated by checking the vector for unspecified, negative, sets, etc? Might demand that we create some more is*()
logical tests for these annotations.
Yes, thank you, that is indeed one option! I will look into that.
We can also add an extra logical argument for expand() not to expand ranges or sets since for those we can actually get the min and max dates, for example, without expanding the whole range or set.
Yes, that should also accelerate things.
I have tried to both add an extra logical argument for expand()
not to expand ranges or sets and adding an extra logical test for uncertain dates (is_uncertain()
). Both appear to accelerate things a great deal by simply identifying when expansion is necessary (and not).
However, since the logical tests are a bit more flexible and can be used for all resolve methods, this is how it is currently implemented.
For example:
# main branch (currently on CRAN)
#system.time(as.Date(manystates::leaders$ARCHIGOS$Beg, min))
# user system elapsed
# 144.502 0.698 145.284
# develop branch (with new logical tests is_uncertain())
system.time(as.Date(manystates::leaders$ARCHIGOS$Beg, min))
# user system elapsed
# 0.308 0.027 0.337
Now although this should already solve the issue, going through expand made me realise that most of what is done by the function is not actually expanding messydates dates, but formatting dates to the point they are "expandable". @jhollway I am wondering if we should not front-load this formatting a bit more into as_messydate()
instead? For example, as_messydate("2001")
could already return "2001-01-01..2001-12-31. Please let me know what you think, thank you.
Also, thank you @BBieri for raising this issue, great catch!
Excellent @henriquesposito , this is a very considerable improvement!
I don't think we should already convert "2001" into "2001-01-01..2001-12-31", as this is considerable more verbose. More succinct representations of messydates should always be preferred.
Created on 2022-04-05 by the reprex package (v2.0.1)