Open beyarkay opened 11 months ago
What do you think of the following for the Historical Data Format (following a similar pattern to manually_specified.yml
):
historical_changes:
- stage: 4
start: 2023-08-31T14:00:00
finish: 2023-09-02T05:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
historical: false
- stage: 2
start: 2023-09-02T05:00:00
finish: 2023-09-02T16:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
historical: true
where historical: It is a boolean field which will be true
for entries that are coming from historical data and false
for new entries.
I'm actually wondering if there are any disadvantages to keeping the format identical, so future changes look the same as historical changes:
historical_changes:
- stage: 4
start: 2023-08-31T14:00:00
finish: 2023-09-02T05:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
- stage: 2
start: 2023-09-02T05:00:00
finish: 2023-09-02T16:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
Very happy to hear feedback/opinions on this, but my reasoning is:
western-cape-stellenbosch
for the upcoming week would be identical to figuring out the loadshedding schedule for western-cape-stellenbosch
for the past week.Here's a link to the struct
that defines the loadshedding change. Removing the rust-specific details, it looks like:
struct Change {
start: String,
finsh: String,
stage: unsigned 8-bit integer,
source: String,
include_regex: Option<String>,
exclude_regex: Option<String>,
include: Option<String>,
exclude: Option<String>,
}
include
and exclude
are really just syntactic sugar that get converted into explicit regexs which an area name must match if it is affected by the relevant Change
. include
and exclude
get converted to include_regex
and exclude_regex
by this function which basically just converts shorthand like cape-town
into regex like city-of-cape-town-area-\d{1,2}
. The regex matching hasn't been as useful as I thought it would be (and I don't think I've ever actually used it in manually_specified.yaml
) so I don't think it's worth your time trying to deal with it. Just assume include_regex
and exclude_regex
don't exist.
Agreed. I don't see a disadvantage in keeping the format the same.
To explain my thought process for dealing with two files that have overlapping times or conflicting stages like this:
- stage: 3
start: 2023-09-04T10:00:00
finsh: 2023-09-04T22:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
- stage: 5
start: 2023-09-04T22:00:00
finsh: 2023-09-05T05:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
and a newer file:
- stage: 5
start: 2023-09-04T18:00:00
finsh: 2023-09-04T22:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
- stage: 6
start: 2023-09-04T22:00:00
finsh: 2023-09-05T05:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
where the differences are stage 3 in the first entry moving to stage 5 for a portion of the overlapping time and a change from stage 5 to 6 for a whole time frame.
Then the output in the historical data would be
- stage: 6
start: 2023-09-04T22:00:00
finsh: 2023-09-05T05:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
- stage: 5
start: 2023-09-04T18:00:00
finsh: 2023-09-04T22:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
- stage: 3
start: 2023-09-04T10:00:00
finsh: 2023-09-04T18:00:00
source: https://twitter.com/CityofCT/status/1698744757000831345
include: coct
where the rules for these changes can be summarised as "new entries replace older entries for overlapping times". Essentially, we delete the incorrect entry from the file completely...
Yes that looks correct to me. Although attempting to read these is making me remember why I tried to make a schedule visualiser a while back (it's trickier than it might seem at first glance). If I get a chance later on today, I'll write up some test cases (probably formatted as a multi-document yaml file) so that we can get the computer verifying these things for us.
I'll write out some high-level test examples below:
It'll be useful to have a little custom syntax: file1
is older than file2
, the caret ^
indicates the current time,
and a series of numbers like _ _ 4 4 0 2 2 2
indicates several stages over
some unit of time:
_ _
: 2 units where we don't know what loadshedding stage it is,4 4
: 2 units of stage four,0
: followed by no loadshedding for one unit of time2 2 2
: followed by stage two for 3 units of timeWith the above, we can define some mini-test examples like:
If there are conflicts in the future, the newer file should take precedence:
file1: 2 2 2 2
file2: 4 4 2 2
now: ^
result: 4 4 2 2
If there are conflicts in the past, the newer file should still take precedence (sometimes loadshedding will be bumped to stage 6 at 2am, but the announcement will only be made public at 7am, so we want to catch this edge case):
file1: 2 2 2 2
file2: 4 4 2 2
now: ^
result: 4 4 2 2
If the start
/finsh
boundaries don't align nicely across different
files, then the result should properly figure out the new boundaries
file1: 2 2 2 2 3 3 3 3
file2: _ _ 4 4 4 4 2 2
now: ^
result: 2 2 4 4 4 4 2 2
If an old file says "stage 6 for the rest of time" but a newer file updates to say "stage 3 for the next week" then there should only be stage 3 for the next week (it should not be followed by stage 6 for the rest of time)
file1: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
file2: _ _ _ _ 3 3 3 3 _ _ _ _ _ _ _
now: ^
result: 6 6 6 6 3 3 3 3 _ _ _ _ _ _ _
The above example also shows that there must be an option to specify "unknown loadshedding". Unfortunately this does happen sometimes and it's unavoidable.
Finally, here's one big example, just to stress test things a bit
file1: 1 2 2 2 2 _ _ _ _ _ _ _ _ _ _ _ _ _
file2: _ _ 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
file3: _ _ _ _ _ 3 2 3 2 _ _ _ _ _ _ _ _ _
file4: _ _ _ _ _ _ _ _ _ _ _ 1 1 1 1 1 1 1
file5: _ _ _ _ _ _ _ _ _ _ _ _ _ 0 0 1 1 _
file6: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2 2
now: ^
result: 1 2 6 6 6 3 2 3 2 _ _ 1 1 0 0 1 2 2
I'll try write these up as YAML files tonight, but this should give you a good idea. Please do bug me if it looks like I haven't been consistent with the rules.
I will have a look at this and get back to you but an issue (maybe something I missed?) is the different formatting of the manually_specified.yaml file. I can get you the commit hash if necessary. Easy enough to skip the files that are misbehaving and mark a period between two successful reads as unknown.
For example:
# How to edit this file:
# You should add items to `changes`. For example, here's a template that you
# can copy and paste just below the line `changes:`:
# ```
# - stage: <STAGE NUMBER HERE>
# start: <START TIME HERE>
# finsh: <FINISH TIME HERE>
# source: <URL TO INFORMATION SOURCE HERE>
# exclude: <coct if this schedule doesn't apply to cape town>
# include: <coct if this schedule only applies to cape town>
# ```
# See the README.md for more details
---
changes:
start: 2023-08-31T14:00:00
finsh: 2023-09-02T05:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
- stage: 2
start: 2023-09-02T05:00:00
finsh: 2023-09-02T16:00:00
source: https://twitter.com/Eskom_SA/status/1697210092179935262
exclude: coct
- stage: 4
start: 2023-08-31T10:00:00
finsh: 2023-08-31T17:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
- stage: 2
start: 2023-08-31T17:00:00
finsh: 2023-08-31T22:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
- stage: 4
start: 2023-08-31T22:00:00
finsh: 2023-09-01T05:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
- stage: 2
start: 2023-09-01T05:00:00
finsh: 2023-09-01T22:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
- stage: 4
start: 2023-09-01T22:00:00
finsh: 2023-09-03T05:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
- stage: 2
start: 2023-09-03T05:00:00
finsh: 2023-09-03T17:00:00
source: https://twitter.com/CityofCT/status/1697259196931252229
include: coct
historical_changes: []
as compared to:
# How to edit this file:
# You should add items to `changes`. For example, here's a template that you
# can copy and paste just below the line `changes:`:
# ```
# - stage: <STAGE NUMBER HERE>
# start: <START TIME HERE>
# finsh: <FINISH TIME HERE>
# source: <URL TO INFORMATION SOURCE HERE>
# exclude: <coct if this schedule doesn't apply to cape town>
# include: <coct if this schedule only applies to cape town>
# ```
# See the README.md for more details
---
- stage: 3
start: 2023-08-27T16:00:00
finsh: 2023-08-28T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-08-28T05:00:00
finsh: 2023-08-28T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-08-28T16:00:00
finsh: 2023-08-29T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-08-29T05:00:00
finsh: 2023-08-29T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-08-29T16:00:00
finsh: 2023-08-30T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-08-30T05:00:00
finsh: 2023-08-30T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-08-30T16:00:00
finsh: 2023-08-31T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-08-31T05:00:00
finsh: 2023-08-31T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-08-31T16:00:00
finsh: 2023-09-01T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-09-01T05:00:00
finsh: 2023-09-01T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-09-01T16:00:00
finsh: 2023-09-02T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-09-02T05:00:00
finsh: 2023-09-02T16:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 3
start: 2023-08-27T16:00:00
finsh: 2023-08-28T00:00:00
source: https://twitter.com/CityofCT/status/1695804610932273188
include: coct
historical_changes: []
@beyarkay have you had any time to generate some test yaml files? (I am also working on some).
I have started some rudimentary test cases for a data aggregation file I wrote here. I have generated the historical data using this file but I am afraid it will (may) be riddled with errors until some proper testing is done.
Hey, sorry for the delay.
Yes you're correct, the misbehaving file should be omitted (the one formatted like:
---
- stage: 3
start: 2023-08-27T16:00:00
finsh: 2023-08-28T05:00:00
source: https://twitter.com/Eskom_SA/status/1695790083796828468
exclude: coct
- stage: 1
start: 2023-08-28T05:00:00
...
I'm not sure what happened there). The correctly formatted file should have two keys: changes
and historical_changes
, each of which accepts a list of "change" objects (although historical_changes
is deprecated and not used anymore).
Busy working on the test files now, should have them uploaded to your PR in a bit.
Keeping track of historical loadshedding is technically feasible at the moment, but it isn't easy to accomplish.
Basically all the information is in the git log for the file
manually_specified.yaml
, but extracting and compiling it would be a pain.Probably the easiest way to make historical loadshedding data available would be to have a CI/CD script that runs every time the calendars get built. This script should calculate the historical loadshedding (either by updating the previously calculated data or by recalculating everything from scratch) and emit a file containing that information.
For parsing, it would be easiest if that file were formatted in the same way as
manually_specified.yaml
:Keeping the format the same would mean the main codebase is equally able to calculate historical loadshedding and future loadshedding. However, it shouldn't be too much work to parse some different format, if that format provided some benefits.
Note that YAML is a superset of JSON, so the below snippet is valid YAML, while requiring fewer characters:
Keeping the format the same is not a hard requirement, but alternatives should be properly motivated.
Here's a high level checklist:
eskom-calendar-dev
repo. This is a private mirror ofeskom-calendar
, used to test CI/CD things. You can also set up your own, but getting the private GitHub keys setup (which allow GH actions to run faster) can be a pain.calendars/
directory, as that's the only directory that gets uploaded to GitHub releases.manually_specified.yaml
and asserting that the updated changes get properly integrated.If the above are done, then all should be good! @beyarkay will check things over and merge.