Open auscompgeek opened 11 years ago
@keepcalm444 could you please share with us how you grab BoM data?
Just for reference, this is how another bot behaves when you ask it for weather:
<auscompgeek> `bom sydney olympic park nsw
<notabot> Current Weather Conditions supplied by http://www.bom.gov.au for Sydney Olympic Park, NSW (As of 12:04pm):
<notabot> Temp: 18.1'C (64.58'F), App: 14.9'C, Humidity: 64%, Wind: Moderate breeze, Dir: SE, Sp: 19 Gu: 26 km/h, Min: 15.2'C at 01:10am Max: 20.3'C at 10:48am
here's a gist (it's a php script, found in action here)
stationList is only for NSW though. If it's needed I can generate a few for other states.
Here's a Whirlpool thread about the BoM API.
Here's a Gist for generating the list of stations (and the actual lists themselves in INI format (yay)) for all the states, because the BoM has some screwed up stuff going on.
Does it come with a JSON api? Or do we have to do some type of html scraping? @keepcalm444, can you comment? From what @auscompgeek posted, it looks like you're doing scraping using regex. I point you all to: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags.
A fair chunk of the scraping is done via Regex for speed reasons (at least legacy speed reasons, I'm a bit out of touch with the codebase right now). In the python2 version, originally all scraping was being run straight through BeautifulSoup, however on low-power machines like RasPi, it was causing lags of up to a second. Regex was significantly faster.
Fair 'nuff. I would like to see some legitimate speed tests (probably with some form of memory profiling too) to see whether or not this is still true with the current codebase (and with py3k). Also, we should do some CProfiler
or perf
profiling before coming to a decision like that.
Yeah I really have no idea. Maybe after exams :3
@cyphar The BoM has a JSON API (sorta) for weather data, however, you need the IDs of each weather station to do this. I'm not sure whether there's an API to grab weather station IDs.
Yes, I used regex to scrape HTML to generate the ID list. See the 2nd top-voted answer to the Stack Overflow thread you linked to. And yeah, what ackwell said.
Suggestions:
Note: The Weather Channel refers to Weather Underground for an API.
Yahoo! also has RSS feeds.Note: Yahoo! uses data from The Weather Channel, so we can forget that.
Here's a whole list of weather APIs (filtered to JSON format).