Closed art-w closed 1 year ago
Hm, it does take surprisingly long to resolve, and indeed I think your explanation is correct.
Though I vaguely recall having some handling for cases like this, let me check...
I'll need to read your design of result_space
a bit more closely before I can give a proper response.
Very nice library! Thank you for your work :)
: D
W.r.t. your proposal:
In any case, may I suggest a different representation for Resolver.result_space than lists or even seq? The main issue is that they can't skip any work, so it always costs O(N + M) for intersections rather than something closer to O(min(N, M)).
There are a lot more skipping happening already, and successor query is accommodated in one way or another internally already. Namely internally it'll propagate the bound to the pattern resolver for it to restart the pattern matching at a different interval.
The result space is intentionally a very rough overapproximated bound in any case, and should not be dense enough to be the cause of slowdown, see the approximation done for intervals in overapproximate_result_space_bottom_up
:
| Intervals (_, s) -> (
match s () with
| Seq.Nil -> t
| Seq.Cons ((start, _), _) ->
Intervals ([ (start, default_result_space_end_exc) ], s))
And there are static and dynamic adjustment to the actual search space for union and intersection.
For static adjustment pass, mainly see:
optimize_search_space
, which uses the following two internally
overapproximate_result_space_bottom_up
restrict_result_space_top_down
For dynamic adjustments, some examples:
aux_inter
let rec aux_inter' ~start (timeres : t list) =
let interval_batches = resolve ~start search_using_tz timeres in
let batch_for_sampling = collect_batch interval_batches in
if List.exists CCOption.is_none batch_for_sampling then Seq.empty
else
let batch_for_sampling =
CCList.map
(CCOption.get_exn_or "Unexpected None in batch_for_sampling")
batch_for_sampling
in
match batch_for_sampling with
| [] -> Seq.empty
| _ ->
let open Timedesc.Span in
let rightmost_interval =
batch_for_sampling
|> List.sort_uniq (fun x y -> Time.Interval'.compare y x)
|> List.hd
in
let rightmost_start = fst rightmost_interval in
let end_exc = rightmost_start + inter_slice_size in
(* we shift the start of our scope to rightmost_start *)
let timeres =
slice_result_space_multi ~start:rightmost_start timeres
in
(* refresh the interval batches if the gap is too large *)
let interval_batches =
if
rightmost_start - start
>= dynamic_search_space_adjustment_trigger_size
then resolve ~start:rightmost_start search_using_tz timeres
else slice_batches ~start:rightmost_start interval_batches
in
let intervals_up_to_end_exc =
interval_batches
|> CCList.to_seq
|> Seq.map (Intervals.Slice.slice ~skip_check:true ~end_exc)
|> Intervals.Inter.inter_multi_seq ~skip_check:true
in
fun () ->
Seq.Cons (intervals_up_to_end_exc, aux_inter' ~start:end_exc timeres)
where we do the skip based on a bundle of timere objects
aux_union
let rec aux_union' (timeres : t Seq.t) (intervals : Interval'.t Seq.t) =
match intervals () with
| Seq.Nil -> Seq.empty
| Seq.Cons ((start, end_exc), rest) ->
let open Timedesc.Span in
let size = end_exc - start in
if size >= dynamic_search_space_adjustment_trigger_size then
let timeres = slice_result_space_multi_seq ~start:end_exc timeres in
let next_intervals =
resolve_and_merge timeres
|> OSeq.drop_while (fun x -> Time.Interval'.le x (start, end_exc))
in
fun () ->
Seq.Cons ((start, end_exc), aux_union' timeres next_intervals)
else fun () -> Seq.Cons ((start, end_exc), aux_union' timeres rest)
I added some debug statements and can confirm the result space bounds were propagated correctly, and minutes
seem to be very quick, so I am suspecting inefficiency in the pattern_resolver
.
I'll try to optimise it and see where that leads us.
Right now I suspect there are too many conversions from set to seq - I'll play around with pattern_resolver.ml
when I have time later.
Many thanks for raising the issue!
Hm...now I'm not very sure...
Aha...might have found the culprit...
Might be missing a call to slice_result_space_multi
in aux_inter'
Ooh right sorry, I missed the overapproximate
optimization! Indeed it calculates the right lowerbound (= now
), but then it searches from 2022/01/01 until today. Maybe it's because the pattern enumeration only uses the year of the lowerbound? (overall_search_start
is reset to year/01/01
below)
There's still a lot of code that I haven't read, so maybe it's not this! Thank you for looking into it so fast :)
Maybe it's because the pattern enumeration only uses the year of the lowerbound?
Oh yeah possibly! I'll need to remind myself of how all these work, it's been a year or more since I've last worked on them.
I think you're spot on.
Through dune utop src/
:
let x = Result.get_ok @@ Timere.(resolve
(after (Timedesc.make_exn ~year:2022 ~month:01 ~day:01 ~hour:01 ~minute:01 ~second:01 ()) &&& seconds [0]))
is significantly faster than
let x = Result.get_ok @@ Timere.(resolve
(after (Timedesc.make_exn ~year:2022 ~month:12 ~day:01 ~hour:01 ~minute:01 ~second:01 ()) &&& seconds [0]))
Inefficiency of pattern resolver has been addressed in #56
That was fast! Thanks a lot, it's much appreciated :)
I'm not impacted by the following, but after reading and thinking about the code, I believe there's an interesting edge case between the "search start time" optimization and lengthen
:
# Timere.(resolve_exn
(since (Timedesc.make_exn ~year:2022 ~month:1 ~day:1 ~hour:1 ~minute:0 ~second:0 ())
&&& lengthen (Timedesc.Span.make_small ~s:(30 * 60) ()) (minutes [0]))) ;;
[2022 Jan 01 00:00:00 +00:00:00, 2022 Jan 01 00:31:00 +00:00:00)
[2022 Jan 01 01:00:00 +00:00:00, 2022 Jan 01 01:31:00 +00:00:00)
[2022 Jan 01 02:00:00 +00:00:00, 2022 Jan 01 02:31:00 +00:00:00)
If the date is increased by a minute, then the first interval is lost even though it still overlaps with since
:
# Timere.(resolve_exn (* vvvvvvvvv was 0 *)
(since (Timedesc.make_exn ~year:2022 ~month:1 ~day:1 ~hour:1 ~minute:1 ~second:0 ())
&&& lengthen (Timedesc.Span.make_small ~s:(30 * 60) ()) (minutes [0]))) ;;
[2022 Jan 01 01:00:00 +00:00:00, 2022 Jan 01 01:31:00 +00:00:00)
[2022 Jan 01 02:00:00 +00:00:00, 2022 Jan 01 02:31:00 +00:00:00)
[2022 Jan 01 03:00:00 +00:00:00, 2022 Jan 01 03:31:00 +00:00:00)
My understanding is that we would expect the first interval to be truncated by a minute, as it's what we get from a similar expression:
# Timere.(resolve_exn
(since (Timedesc.make_exn ~year:2022 ~month:1 ~day:1 ~hour:1 ~minute:1 ~second:0 ())
&&& (pattern_intervals `Whole
(Points.make_exn ~lean_toward:`Earlier ~hour:0 ~minute:0 ~second:0 ())
(Points.make_exn ~lean_toward:`Earlier ~hour:0 ~minute:31 ~second:0 ())))) ;;
[2022 Jan 01 00:01:00 +00:00:00, 2022 Jan 01 00:31:00 +00:00:00) (* <--- *)
[2022 Jan 02 00:00:00 +00:00:00, 2022 Jan 02 00:31:00 +00:00:00)
[2022 Jan 03 00:00:00 +00:00:00, 2022 Jan 03 00:31:00 +00:00:00)
I believe the "search start time" for patterns has to subtract the intervals durations to catch these overlapping ranges (that starts before but end after)... Anyway I'm not impacted at all by this behavior, but I could not resist thinking about the cool algorithms involved in timere!
Thanks again for your quick support!
Yes indeed lengthen
has been a source of headache for the result/search space adjustment, also shift
.
(I usually leave discovery of these cases to fuzzer, but I lack the equipment for fuzzing right now sadly). But I think you found a case that wasn't caught by fuzzer previously in any case : D
Fixed now by #57 I believe
Yes it works! Thanks a ton, I'm a big fan of your reactivity and I'm very happy of how easy the cron scheduling turned out be thanks to timere! Have a nice weekend :)
Good to hear!
Right now somewhat waiting on timedesc 0.7.0 publish before submitting this version of timere, but if pointing to the git repo suffices for you then I'm not gonna try to hurry too much as I'm still busy with other stuff.
Yes the git pin is great! There's absolutely no need to rush, please take your time :)
@art-w Right now I am in the progress of packaging things up for timere 0.8.0, and found that as part of addressing the slowness of the pattern resolver during our conversation, the commit 84792815ccfa1f8ae4bd9c228e314491dc31456a introduced a bug that causes incorrect results when negative month day is used.
Hope that does not affect you.
Thanks for the warning! I don't think it has affected us, we use it to schedule tasks in the future on regular intervals like "every friday" :) (with no reason to end up with negative month day)
Out of curiosity: are you guys using it for task scheduling akin to cron?
Yes! We use it in current-bench
to allow running longer benchmarks every so often, rather than the default to run on every commit/PR where shorter benchmarks are more appropriate. It was really easy to setup in a generic way thanks to your libs :)
Oh neat! So it's actually used in practice : D Also very happy to see timere-parse seems usable enough for config files for the pipeline
(I'm guessing #69 is related to the same project then, in which case I'll just track timere 0.8.0 progress on that issue.)
Thanks a lot for the great work and support! :)
Very nice library! Thank you for your work :)
I wanted to use
Timere
to schedule cron-like tasks but I get stuck when I try to compute the next time the cron should run. For a task that should run every minute, I tried:My understanding is that
Timere
is only able to optimize the intersections ofPattern
, whichseconds
is butsince
is not. And so it resorts to computing a list of all the minutes from year 0, then doing an intersection viaSeq
.Knowing this, I can sort of cheat and do multiple
resolve
with variations onyears
,months
, etc to simulatesince now
in a way that pattern can optimize... but it's not really convenient.Did I miss an easier way of doing this?
In any case, may I suggest a different representation for
Resolver.result_space
than lists or even seq? The main issue is that they can't skip any work, so it always costsO(N + M)
for intersections rather than something closer toO(min(N, M))
.There is this lesser known operation on ordered collections called the successor query. It's very similar to
mem
/find
but it returns a more useful result when the query fails:The idea is that it should return the interval that contains the timestamp -- but if no such interval exists, then the very next interval that follows (or
None
at the end when the timestamp is too big.) So it's exactly the operation I'm trying to do to find when my cron should run next!Forgive my bad illustrations but in both situations below,
successor
would return the[t0, t1)
interval:(We can recover
mem
/find
by checking iftimestamp >= t0
)The
successor
operation can help a lot for computingintersection
, because it can skip forward and it also provides a uniform implicit representation ofTimere.t
. In pseudo-code:union
of two successors is their minimumSeq
out of a successor function by querying it atTimestamp.min_val
, getting back the first interval[t0, t1)
then iterating witht1
to search for the following oneintersection
is a tiny bit more complex, we have to search forward until both successors return overlapping intervals. Every failure to intersect moves us forward though.For my situation
seconds [0] &&& since now
, the intersection would go through the steps:successor (seconds [0]) Timestamp.min_val
yields a first candidate[0, 1)
successor (since now) 0
returns[now, inf)
which is incompatible, so we should retry with the new lower-boundsuccessor (seconds [0]) now
yields a[now + dt, now + dt + 1)
which is an overlapping interval with the previous[now, inf)
It's not perfect, as you can still build intervals that have no intersection and the successor is not going to help:
But in general it's a lot better when intersections do exist! (I used it in some terrible code that turned out to be fast enough thanks to the successor)
... And finally, the
successor
representation could also open the door to operations like "every 7s" that don't fit in a Pattern.