Open nasonfish opened 8 years ago
I think we should switch away from html scraping if possible. It looks like there are a few api's available this was the top google hit and it it seems like it could work: https://github.com/tapasweni-pathak/Horoscope-API
Thats a web frontend for https://testpypi.python.org/pypi/horoscope, which rips from Ganeshaspeaks... sooo, not really better.
On Fri, Oct 16, 2015 at 1:24 AM, Andy Edwards notifications@github.com wrote:
I think we should switch away from html scraping if possible. It looks like there are a few api's available this was the top google hit and it it seems like it could work: https://github.com/tapasweni-pathak/Horoscope-API
— Reply to this email directly or view it on GitHub https://github.com/CloudBotIRC/CloudBot/issues/199#issuecomment-148370220 .
If there is no free api maybe horoscope gets dropped since maintaining an HTML scraping plugin can be pretty burdensome.
For what it's worth, here's an updated version of the plugin, but where to go from here is debatable, if we should just keep supporting this site or not.
# Plugin by Infinity - <https://github.com/infinitylabs/UguuBot>
import requests
from bs4 import BeautifulSoup
from cloudbot import hook
from cloudbot.util import formatting
@hook.on_start()
def init(db):
db.execute("create table if not exists horoscope(nick primary key, sign)")
db.commit()
@hook.command(autohelp=False)
def horoscope(text, db, bot, notice, nick):
"""<sign> - get your horoscope"""
headers = {'User-Agent': bot.user_agent}
# check if the user asked us not to save his details
dontsave = text.endswith(" dontsave")
if dontsave:
sign = text[:-9].strip().lower()
else:
sign = text
db.execute("create table if not exists horoscope(nick primary key, sign)")
if not sign:
sign = db.execute("select sign from horoscope where "
"nick=lower(:nick)", {'nick': nick}).fetchone()
if not sign:
notice("horoscope <sign> -- Get your horoscope")
return
sign = sign[0]
url = "http://my.horoscope.com/astrology/free-daily-horoscope-{}.html".format(sign)
try:
request = requests.get(url, headers=headers)
request.raise_for_status()
except (requests.exceptions.HTTPError, requests.exceptions.ConnectionError) as e:
return "Could not get horoscope: {}.".format(e)
soup = BeautifulSoup(request.text)
title = soup.find_all('h1', {'class': 'f40'})
if not title:
return "Could not get the horoscope for {}.".format(text)
title = title[0].text.strip()
horoscope_text = soup.find('div', {'class': 'block-horoscope-text'}).text.strip()
result = "\x02{}\x02 {}".format(title, horoscope_text)
result = formatting.strip_html(result)
if text and not dontsave:
db.execute("insert or replace into horoscope(nick, sign) values (:nick, :sign)",
{'nick': nick.lower(), 'sign': sign})
db.commit()
return result
We HTML scrape from a site and that site changed their HTML such that the fields we were using previously no longer match the classes we use in horoscope.py, resulting in us not being able to find a sign and returning an error no matter what.