UltrosBot / Ultros

Connecting communities, one squid at a time! Ultros is a multi-protocol chat bot written in Python, designed with both the user and developer in mind
http://ultros.io
Artistic License 2.0
23 stars 7 forks source link

URL title fetcher - git.io and other short URLs #9

Closed rakiru closed 10 years ago

rakiru commented 11 years ago

gio.io URLs return 400 errors for some unknown reason. Possibly missing headers or something.

EDIT: This also goes for db.tt URLs. Something weird is going on here.

gdude2002 commented 11 years ago

Git.io being what it is, there's probably some API we can use..

rakiru commented 11 years ago

True. I'm curious what the issue with the current method is though.

gdude2002 commented 11 years ago

Yeah. It does seem pretty odd, but I haven't noticed it with any other sites.

gdude2002 commented 10 years ago

This seems to be due to URLLib's inability to follow some 30x redirects. We should probably look at the requests module or work around this ourselves.

rakiru commented 10 years ago

I think moving to requests would be a decent idea, rather than having to build upon urllib. I can see other plugins needing it anyway, perhaps even for special URL handlers, and with the great package manager system, dependencies like that aren't an issue anyway.

gdude2002 commented 10 years ago

It's not /that/ great yet. :P

Yeah, maybe we could use Requests. However, current URLs plugin is working and is extensible already - Maybe take a look at it first.

rakiru commented 10 years ago

"yet" being the key word there. ;P

Well, if I remember correctly, it involves subclassing RedirectHandler (or something similar) and calling urllib2.install_opener() or similar. Using requests would be much simpler/cleaner, and should be trivial to do.

gdude2002 commented 10 years ago

Well, the main problem seems to be that it throws errors on non-200 status codes. Urllib itself doesn't do that.

On Thu, Nov 21, 2013 at 4:20 PM, Sean Gordon notifications@github.comwrote:

"yet" being the key word there. ;P

Well, if I remember correctly, it involves subclassing RedirectHandler (or something similar) and calling urllib2.install_opener() or similar. Using requests would be much simpler/cleaner, and should be trivial to do.

— Reply to this email directly or view it on GitHubhttps://github.com/UltrosBot/Ultros/issues/9#issuecomment-28998262 .

rakiru commented 10 years ago

You mean with other 2XX messages, or with non 2XX messages?

gdude2002 commented 10 years ago

Non-2XX messages, I think.

rakiru commented 10 years ago

Well, yeah, those are errors. Just catch the exceptions and deal with them. What's urllib do with a 404/403, etc?

gdude2002 commented 10 years ago

Nothing, I think. Could check: http://httpstat.us

EDIT: http://puu.sh/5pab8.png

gdude2002 commented 10 years ago

Fixed by https://github.com/UltrosBot/Ultros/commit/de1904e7f51208c9f68bce5ec37f95fb0090c6ea

rakiru commented 10 years ago

https://github.com/UltrosBot/Ultros/commit/4085e4c5c48336f4659f01996096c3a6974bb9d3 I'm not sure if this was the only problem, but the URLs from Notifico were surrounded by formatting chars, which were still on the end of the URL we were attempting to download.