8none1 / octopus_powerups

Programmatic access to Power Up time data
http://www.whizzy.org/octopus_powerups/
GNU General Public License v3.0
4 stars 2 forks source link

Email scraper #1

Closed PianSom closed 2 months ago

PianSom commented 2 months ago

First off - thanks for sharing. I've been thinking for ages about trying to do something similar.

Is your email scraper in a state where it could be shared? Even if it's for a different platform/setup it's always easier to build on someone else's work.

gcoan commented 2 months ago

First off - thanks for sharing. I've been thinking for ages about trying to do something similar.

Is your email scraper in a state where it could be shared? Even if it's for a different platform/setup it's always easier to build on someone else's work.

I came here to make a comment of exactly the same thing, could the email scraper be shared so others in other regions can set it up?

Does it run under HA or under another host operating system?

8none1 commented 2 months ago

I've just uploaded the code here: https://github.com/8none1/octopus_powerups/blob/main/gapps_scripts/powerups_email_finder.gs

The code runs inside Google Apps Script and is dependent on having read only access to a GMail account. I recommend that you create a separate gmail account to run this in and then use some filtering from your personal email account to find and forward the power ups email to this new GMail account.

The work flow is:

  1. Publish the gs code to the web via GApps Script so that when you call a "secret" URL the code gets run (the entry point is doGet()). Give it read only rights to your new GMail account.
  2. From your original Email account, Power ups email from Octopus comes in -> A filter matches (e.g. from: hello@octopus subject: power-ups) and forwards the email to your separate gmail account)
  3. When ever you GET the secret URL (where the GApps script is published) the code will run and will return a JSON object with zero to 3 entries in an array. This URL is the source for Home Assistant. It looks like a normal URL which provides a JSON object.
  4. If you want to publish the JSON object for other people to use I would recommend that you add another layer of indirection. Run a cron job somewhere every 15 mins or so which GETs the secret URL and stashes the results somewhere with a bit more bandwidth. This means that every time someone fetches the data the GApps script is not being run, instead the pre-downloaded file (via the cron job) is fetched. In this case do not publish the secret URL, instead publish the URL of where the JSON response has been stashed. In my case I'm backing it off to Github because I figure they can handle it.

I did consider having GH Actions run every 15 mins to fetch the JSON object from the GApps script, but I think that would have leaked the secret URL and I didn't want to do that, so instead I'm running a cron job on a Pi in my house (the same Pi running HA).

The risks that I can imagine are:

8none1 commented 2 months ago

I would welcome improvements to the GS code :smile:

Plus I noticed that the checking of the headers doesn't actually do anything. Guess I never hooked it up, probably because it wasn't reliable.

8none1 commented 2 months ago

and if you didn't already see it; there is a bit more information here: https://www.whizzy.org/2024-01-24-powerups-api/

8none1 commented 2 months ago

Marking as resolved. I added a note to the README to link to this issue for people who would like more info on the scraper.