beyarkay / eskom-calendar

Get your loadshedding schedule in your calendar and never be left in the dark! Open-source, up-to-date, and developer friendly.
https://eskomcalendar.co.za
GNU General Public License v3.0
190 stars 35 forks source link

Automating updates for manually_specified.yaml #1

Closed fragtion closed 2 years ago

fragtion commented 2 years ago

Looks like yours is the first open source project to implement Eskom's national stage schedule into the results, Congrats & really nice work. I also prefer the idea of representing loadshedding schedules as a visualized time series like a calendar, as opposed to just time tables, so kudos from me there too

When it comes to updating manually_specified.yaml I see you would need to accept a pull request ?

I'm trying to explore options of how we can make this dataset available for all open source projects to use, in way that is secure, easily accessible, and easily/quickly updatable by the public, without needing to be moderated by a single person who might not be available at the time a new schedule is released... but it would also need to be secure enough that it can not be easily manipulated by vandalism or malicious intent. I was thinking something along the lines of a blockchain or wiki ? Realistically this would be a distinct, purpose-build project, and hosted as some kind of API on a private web server independently of github.

A bit of an over-engineered approach? Then how about a Google Sheets document maintained by a group of volunteers with hierarchical ACL ?

Hope this isn't going too off-topic but I'll let you decide on that one!

Thanks

beyarkay commented 2 years ago

Hey! glad to see a like minded person. Yeah a definite drawback is how manually_specified.yaml is, well, manual. But I'm currently trying to get GH actions to the point where builds are triggered that update releases automatically and from there the ics files are up-to-date when the user's calendar application goes to fetch the latest data.

My current goal is to accept PRs that modify manually_specified.yaml from whoever gets to it first after eskom pushes an announcement. I'm hoping I'll be able to look over the PR and hit accept without too much fuss until I get something more automated up and running. Obviously code/docs/testing PRs are also welcome, but those will need more scrutiny until I've got unit tests up and running.

Totally agree that I am a single point of failure, and am trying to edit myself out of the process. But honestly I don't think the human side will ever be able to be completely omitted. Eskom could theoretically release any sort of change through any sort of medium. But if we can get to 90% automation (by scraping tweets maybe?) then that'll be brilliant.

Ownership of the data and the question of who-can-edit is a common problem. I personally don't have the experience with blockchain or wikis to implement it, but I'll gladly offer what help I can to whoever wants to give it a go. I would consider it out of scope for this repo though.

I'm hoping this repo will be enough to allow anyone to edit it and submit their changes to be accepted if approvevd by a maintainer. Currently I'm the only maintainer, but I hope more people will make more edits such that I can give them permissions to accept PRs and make changes. I'm wary of using Google Sheets as the central data source since it wouldn't be contained in the repo, it wouldn't be under version control, and plain text files are miles easier to edit in an automated fashion. I also don't want to learn two authentication systems (GitHub and Google) as well as how to know when a GitHub user and Google Sheets editor are one and the same.

However I'd be absolutely keen for a script that automatically submits a PR when a google sheet is edited. And I will admit that Google Sheets has a nicer UI for non-tech folk than GitHub.

I'm 100% on board with whatever we can do to make the dataset open source and fully accessible.

Can I ask how you found out about the repo? Kinda curious because I want to get this out there to the devs of RSA so we can make it bigger and better.

fragtion commented 2 years ago

Have a peek at https://github.com/wernerhp/ha.integration.load_shedding/issues/20

We've pretty much resorted to scraping the stages schedule from City Power's site with beautifulsoup... So that's probably not the most reliable approach but for now it seems to be working okay (assuming City Power obviously doesn't go and re-structure too much on their schedule page any time soon)

How I found this repo? Literally just searching google and github for "eskom" and "load shedding" and sorting by stars and/or recently modified! So this project is definitely featured and accessible

While there's definitely a use case for this project in its current state, perhaps it could gain a bit more momentum if it included the option/capability of presenting the calendar schedules via a web-browser / on a hosted website, as opposed to generating static CSV/ICS (which could quickly become stale?). https://www.lexity.co.za/ sort of does this, but I don't think there's enough emphasis on the graphical aspect. Calendars do a pretty good job, but I think something like a simple horizontal "moving spectrum" representing the next 7 day week, with load shedding slots appearing on that spectrum as a colored block depending on stage, sort of like a guitar-hero concept but on the X axis rather than the Z axis, if that makes sense? That could be pretty cool and different. I guess it all depends on your ultimate purpose/objective with this project. I do think the calendar/graphical aspect is what sets this project apart from other competing loadshedding API's, and that this is where you would probably want to focus improvements. Depending on how much you want to keep in-house, it might make sense to offload the API side of the data acquisition to another library like Werner's, allowing you to focus more on front-end representation. Just some random thoughts but of course you're the final decision-maker about the project's future trajectory :)

Hopefully Eskom comes up with a more data-complete & developer-friendly API so that we don't have to jump through all these hypothetical hoops and hacks just to solve simple problems, but until then we can only continue to do our best improvising with what we have !

beyarkay commented 2 years ago

Have a peek at https://github.com/wernerhp/ha.integration.load_shedding/issues/20

That's very cool. I'm renting so home automation isn't something I'm familiar with, but can definitely see the use cases.

resorted to scraping the stages schedule

Pity. Yeah I don't think anything proper exists just yet. Although that's not to say something can't be built.

Literally just searching google and github

😃 that's great. I didn't realize it would be so easily found.

it could gain a bit more momentum if it presented the calendar schedules via a web-browser

Oh yeah 100%. It'll definitely cover the greater use-case via a website, but my personal use-case is having loadshedding in my calendar. I use my calendar for just about everything so having to go to a website to check (as opposed to having the content delivered to where I'm already going to be looking) is an extra step that I can't be bothered with.

My current goals are to get all the data into the program (including CPT/JHB and all the other non-eskom shed areas) and being compiled into calendars. In the process of doing that, I'll likely create CSVs that contain the area-based schedules for each area or town. And after that I want to look at better ways of presenting the data (like via a website and an API).

But for now, I just want to concentrate on actually getting and updating the data in a reliable way. I don't have any servers I can use, so using GH for hosting and building is the most I can do on my own unless things dramatically change.

which could quickly become stale?

Not sure what you mean by this? All the CSVs were generated via scraped PDFs on eskom's website (see here) so those won't go stale unless the format changes dramatically. The ICS files get generated from those CSVs paired with the updates to manually_specified.yaml so they should only become stale if the CSVs are or if manually_specified.yaml isn't updated. Unfortunately I don't see a way around having to specify the current schedule manually.

https://www.lexity.co.za/

This is really useful. I was dreading having to parse all the PDFs for Schedule Providers but maybe now I'll just scrape their website. I've emailed them to chat about it. Thanks! I agree though that there's not a huge emphasis on aesthetics.

offload the API side of the data acquisition to another library like Werner's

Having a look at Werner's, I think I actually tried what he's doing before realizing the API doesn't return data that was accurate. Or at least, for the few use cases I tried where I compared it to EskomSePush it wasn't working. Also it doesn't support all those painful edge cases that lexity has data for, so I think I'll try collect everything here unless another solution presents itself.

Hopefully Eskom comes up with a more data-complete & developer-friendly API

I wouldn't hold out hope for any sort of API. If you were a semi-decent developer would you want to go work for Eskom, or would you apply to a well-paying private company? I where I'd go. But I'm hoping that the CSVs hosted by github will be an okay interlude between now and when I can get a proper website + API going. It's not super nice, but you can always curl the repo to get something:

$ curl https://raw.githubusercontent.com/beyarkay/eskom-calendar/main/generated/western-cape-stellenbosch.csv
start_time,finsh_time,date_of_month,stage
00:00,02:30,1,0
00:00,02:30,2,0
00:00,02:30,3,0
00:00,02:30,4,0
00:00,02:30,5,1
00:00,02:30,6,2
00:00,02:30,7,3
00:00,02:30,8,4
00:00,02:30,9,0

And that's better than a PDF

beyarkay commented 2 years ago

Closing, but thanks for the interesting information!