Closed robinholzi closed 2 weeks ago
Attention: Patch coverage is 64.28571%
with 5 lines
in your changes missing coverage. Please review.
Project coverage is 82.46%. Comparing base (
cb0be37
) to head (e1ee770
).
Files | Patch % | Lines |
---|---|---|
modyn/supervisor/internal/utils/time_tools.py | 0.00% | 3 Missing :warning: |
modyn/supervisor/internal/triggers/timetrigger.py | 77.77% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
( % to main) ( % to main)
i will wait for integrationtests to run through for this one and then merge
LGTM. I guess in most cases one should set the start timestamp to 0, if they want a neat triggering boundary. But I guess finding the correct setting here is the task of the user.
For continuous datasets with periodic evaluation not necessarily. There you don't really care where it starts and what the exact bounds are, you just care about time proximity. But yes, for the current slicing approaches we need to set it!
@MaxiBoether could you kick off a cglm pipeline after merging this? I need a pipeline log file with correct training intervals (not affected by the first sample)
Motivation
real_last_timestamp
(start of next training interval - 1 --> marking end of current training interval) only for plotting the boxes in the heatmap plot. For decisions w.r.t. currently active models this doesn't for. E.g. if the next year in a dataset has no data at the year start, our training interval would extend into this next year and therefore it's model won't be considered for the current evaluation interval.Note
Independently of the bug we fix with the
start_timestamp
intimetrigger
, this setting allows us to effectively dopre-training
with the first trigger (e.g. in the continous arxiv dataset we have sparse data from 1988 - ~2005). We could simply start the schedule in year 2005, then the first trigger trains on all previous data.