beyarkay / eskom-calendar

Get your loadshedding schedule in your calendar and never be left in the dark! Open-source, up-to-date, and developer friendly.
https://eskomcalendar.co.za
GNU General Public License v3.0
190 stars 35 forks source link

Keep track of historical load shedding #466

Open beyarkay opened 11 months ago

beyarkay commented 11 months ago

Keeping track of historical loadshedding is technically feasible at the moment, but it isn't easy to accomplish.

Basically all the information is in the git log for the file manually_specified.yaml, but extracting and compiling it would be a pain.

Probably the easiest way to make historical loadshedding data available would be to have a CI/CD script that runs every time the calendars get built. This script should calculate the historical loadshedding (either by updating the previously calculated data or by recalculating everything from scratch) and emit a file containing that information.

For parsing, it would be easiest if that file were formatted in the same way as manually_specified.yaml:

changes:
- stage: 4
  start: 2023-08-31T14:00:00
  finsh: 2023-09-02T05:00:00
  source: https://twitter.com/Eskom_SA/status/1697210092179935262
  exclude: coct
- stage: 2
  start: 2023-09-02T05:00:00
  finsh: 2023-09-02T16:00:00
  source: https://twitter.com/Eskom_SA/status/1697210092179935262
  exclude: coct
...

Keeping the format the same would mean the main codebase is equally able to calculate historical loadshedding and future loadshedding. However, it shouldn't be too much work to parse some different format, if that format provided some benefits.

Note that YAML is a superset of JSON, so the below snippet is valid YAML, while requiring fewer characters:

changes:
- { stage: 4, start: 2023-08-31T14:00:00, finsh: 2023-09-02T05:00:00, source: https://twitter.com/Eskom_SA/status/1697210092179935262, exclude: coct }
- { stage: 2, start: 2023-09-02T05:00:00, finsh: 2023-09-02T16:00:00, source: https://twitter.com/Eskom_SA/status/1697210092179935262, exclude: coct }

Keeping the format the same is not a hard requirement, but alternatives should be properly motivated.

Here's a high level checklist:

If the above are done, then all should be good! @beyarkay will check things over and merge.

keeganwhite commented 10 months ago

What do you think of the following for the Historical Data Format (following a similar pattern to manually_specified.yml):

historical_changes:
  - stage: 4
    start: 2023-08-31T14:00:00
    finish: 2023-09-02T05:00:00
    source: https://twitter.com/Eskom_SA/status/1697210092179935262
    exclude: coct
    historical: false
  - stage: 2
    start: 2023-09-02T05:00:00
    finish: 2023-09-02T16:00:00
    source: https://twitter.com/Eskom_SA/status/1697210092179935262
    exclude: coct
    historical: true

where historical: It is a boolean field which will be true for entries that are coming from historical data and false for new entries.

beyarkay commented 10 months ago

I'm actually wondering if there are any disadvantages to keeping the format identical, so future changes look the same as historical changes:

historical_changes:
  - stage: 4
    start: 2023-08-31T14:00:00
    finish: 2023-09-02T05:00:00
    source: https://twitter.com/Eskom_SA/status/1697210092179935262
    exclude: coct
  - stage: 2
    start: 2023-09-02T05:00:00
    finish: 2023-09-02T16:00:00
    source: https://twitter.com/Eskom_SA/status/1697210092179935262
    exclude: coct

Very happy to hear feedback/opinions on this, but my reasoning is:

Here's a link to the struct that defines the loadshedding change. Removing the rust-specific details, it looks like:

struct Change {
    start: String,
    finsh: String,
    stage: unsigned 8-bit integer,
    source: String,
    include_regex: Option<String>,
    exclude_regex: Option<String>,
    include: Option<String>,
    exclude: Option<String>,
}

include and exclude are really just syntactic sugar that get converted into explicit regexs which an area name must match if it is affected by the relevant Change. include and exclude get converted to include_regex and exclude_regex by this function which basically just converts shorthand like cape-town into regex like city-of-cape-town-area-\d{1,2}. The regex matching hasn't been as useful as I thought it would be (and I don't think I've ever actually used it in manually_specified.yaml) so I don't think it's worth your time trying to deal with it. Just assume include_regex and exclude_regex don't exist.

keeganwhite commented 10 months ago

Agreed. I don't see a disadvantage in keeping the format the same.

To explain my thought process for dealing with two files that have overlapping times or conflicting stages like this:

  - stage: 3
    start: 2023-09-04T10:00:00
    finsh: 2023-09-04T22:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct
  - stage: 5
    start: 2023-09-04T22:00:00
    finsh: 2023-09-05T05:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct

and a newer file:

  - stage: 5
    start: 2023-09-04T18:00:00
    finsh: 2023-09-04T22:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct
  - stage: 6
    start: 2023-09-04T22:00:00
    finsh: 2023-09-05T05:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct

where the differences are stage 3 in the first entry moving to stage 5 for a portion of the overlapping time and a change from stage 5 to 6 for a whole time frame.

Then the output in the historical data would be

  - stage: 6
    start: 2023-09-04T22:00:00
    finsh: 2023-09-05T05:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct
  - stage: 5
    start: 2023-09-04T18:00:00
    finsh: 2023-09-04T22:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct
  - stage: 3
    start: 2023-09-04T10:00:00
    finsh: 2023-09-04T18:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct

where the rules for these changes can be summarised as "new entries replace older entries for overlapping times". Essentially, we delete the incorrect entry from the file completely...

beyarkay commented 10 months ago

Yes that looks correct to me. Although attempting to read these is making me remember why I tried to make a schedule visualiser a while back (it's trickier than it might seem at first glance). If I get a chance later on today, I'll write up some test cases (probably formatted as a multi-document yaml file) so that we can get the computer verifying these things for us.

I'll write out some high-level test examples below:

It'll be useful to have a little custom syntax: file1 is older than file2, the caret ^ indicates the current time, and a series of numbers like _ _ 4 4 0 2 2 2 indicates several stages over some unit of time:

With the above, we can define some mini-test examples like:

If there are conflicts in the future, the newer file should take precedence:

file1:    2 2 2 2
file2:    4 4 2 2
now:     ^
result:   4 4 2 2

If there are conflicts in the past, the newer file should still take precedence (sometimes loadshedding will be bumped to stage 6 at 2am, but the announcement will only be made public at 7am, so we want to catch this edge case):

file1:  2 2 2 2
file2:  4 4 2 2
now:       ^
result: 4 4 2 2

If the start/finsh boundaries don't align nicely across different files, then the result should properly figure out the new boundaries

file1:  2 2 2 2 3 3 3 3
file2:  _ _ 4 4 4 4 2 2
now:             ^
result: 2 2 4 4 4 4 2 2

If an old file says "stage 6 for the rest of time" but a newer file updates to say "stage 3 for the next week" then there should only be stage 3 for the next week (it should not be followed by stage 6 for the rest of time)

file1:   6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
file2:   _ _ _ _ 3 3 3 3 _ _ _ _ _ _ _
now:                ^
result:  6 6 6 6 3 3 3 3 _ _ _ _ _ _ _

The above example also shows that there must be an option to specify "unknown loadshedding". Unfortunately this does happen sometimes and it's unavoidable.

Finally, here's one big example, just to stress test things a bit

file1:  1 2 2 2 2 _ _ _ _ _ _ _ _ _ _ _ _ _
file2:  _ _ 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
file3:  _ _ _ _ _ 3 2 3 2 _ _ _ _ _ _ _ _ _
file4:  _ _ _ _ _ _ _ _ _ _ _ 1 1 1 1 1 1 1
file5:  _ _ _ _ _ _ _ _ _ _ _ _ _ 0 0 1 1 _
file6:  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2 2
now:                   ^
result: 1 2 6 6 6 3 2 3 2 _ _ 1 1 0 0 1 2 2

I'll try write these up as YAML files tonight, but this should give you a good idea. Please do bug me if it looks like I haven't been consistent with the rules.

keeganwhite commented 10 months ago

I will have a look at this and get back to you but an issue (maybe something I missed?) is the different formatting of the manually_specified.yaml file. I can get you the commit hash if necessary. Easy enough to skip the files that are misbehaving and mark a period between two successful reads as unknown.

For example:

# How to edit this file:
# You should add items to `changes`. For example, here's a template that you
# can copy and paste just below the line `changes:`:
# ```
#  - stage: <STAGE NUMBER HERE>
#    start: <START TIME HERE>
#    finsh: <FINISH TIME HERE>
#    source: <URL TO INFORMATION SOURCE HERE>
#    exclude: <coct if this schedule doesn't apply to cape town>
#    include: <coct if this schedule only applies to cape town>
# ```
# See the README.md for more details
---
changes:
  start: 2023-08-31T14:00:00
  finsh: 2023-09-02T05:00:00
  source: https://twitter.com/Eskom_SA/status/1697210092179935262
  exclude: coct
- stage: 2
  start: 2023-09-02T05:00:00
  finsh: 2023-09-02T16:00:00
  source: https://twitter.com/Eskom_SA/status/1697210092179935262
  exclude: coct

- stage: 4
  start: 2023-08-31T10:00:00
  finsh: 2023-08-31T17:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
- stage: 2
  start: 2023-08-31T17:00:00
  finsh: 2023-08-31T22:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
- stage: 4
  start: 2023-08-31T22:00:00
  finsh: 2023-09-01T05:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
- stage: 2
  start: 2023-09-01T05:00:00
  finsh: 2023-09-01T22:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
- stage: 4
  start: 2023-09-01T22:00:00
  finsh: 2023-09-03T05:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
- stage: 2
  start: 2023-09-03T05:00:00
  finsh: 2023-09-03T17:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
historical_changes: []

as compared to:

# How to edit this file:
# You should add items to `changes`. For example, here's a template that you
# can copy and paste just below the line `changes:`:
# ```
#  - stage: <STAGE NUMBER HERE>
#    start: <START TIME HERE>
#    finsh: <FINISH TIME HERE>
#    source: <URL TO INFORMATION SOURCE HERE>
#    exclude: <coct if this schedule doesn't apply to cape town>
#    include: <coct if this schedule only applies to cape town>
# ```
# See the README.md for more details
---
- stage: 3
  start: 2023-08-27T16:00:00
  finsh: 2023-08-28T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-08-28T05:00:00
  finsh: 2023-08-28T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 3
  start: 2023-08-28T16:00:00
  finsh: 2023-08-29T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-08-29T05:00:00
  finsh: 2023-08-29T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 3
  start: 2023-08-29T16:00:00
  finsh: 2023-08-30T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-08-30T05:00:00
  finsh: 2023-08-30T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 3
  start: 2023-08-30T16:00:00
  finsh: 2023-08-31T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-08-31T05:00:00
  finsh: 2023-08-31T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 3
  start: 2023-08-31T16:00:00
  finsh: 2023-09-01T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-09-01T05:00:00
  finsh: 2023-09-01T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 3
  start: 2023-09-01T16:00:00
  finsh: 2023-09-02T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-09-02T05:00:00
  finsh: 2023-09-02T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct

- stage: 3
  start: 2023-08-27T16:00:00
  finsh: 2023-08-28T00:00:00
  source: https://twitter.com/CityofCT/status/1695804610932273188
  include: coct
historical_changes: []
keeganwhite commented 10 months ago

@beyarkay have you had any time to generate some test yaml files? (I am also working on some).

I have started some rudimentary test cases for a data aggregation file I wrote here. I have generated the historical data using this file but I am afraid it will (may) be riddled with errors until some proper testing is done.

beyarkay commented 10 months ago

Hey, sorry for the delay.

Yes you're correct, the misbehaving file should be omitted (the one formatted like:

---
- stage: 3
  start: 2023-08-27T16:00:00
  finsh: 2023-08-28T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-08-28T05:00:00
...

I'm not sure what happened there). The correctly formatted file should have two keys: changes and historical_changes, each of which accepts a list of "change" objects (although historical_changes is deprecated and not used anymore).

Busy working on the test files now, should have them uploaded to your PR in a bit.