dentarg / pynik

:tiger: Internet Relay Chat bot
Other
3 stars 2 forks source link

Improve URL detection #25

Closed dentarg closed 8 years ago

dentarg commented 8 years ago
22:38:19 <..... dentarg> https://easyengine.io/tutorials/linux/ubuntu-postfix-gmail-smtp/  
22:38:20 <.... rufwebot> Configure Postfix to Use Gmail SMTP on Ubuntu - 
                         EasyEngine
22:38:25 <..... dentarg> https://easyengine.io/tutorials/linux/ubuntu-postfix-gmail-smtp/)  
22:38:26 <.... rufwebot> Page not found - EasyEngine

Testing GitHub

as picture: screen shot 2016-02-22 at 23 47 40

dentarg commented 8 years ago

the current regexp https://github.com/dentarg/pynik/blob/02872625fce8248c2bb892dd6c129433ab40b909/plugins/title_reader.py#L65

dentarg commented 8 years ago

Interesting reads

http://www.regexguru.com/2008/11/detecting-urls-in-a-block-of-text/ http://blog.codinghorror.com/the-problem-with-urls/

dentarg commented 8 years ago

https://github.com/mozilla/bleach

>>> import bleach
>>> from bs4 import BeautifulSoup
>>> print BeautifulSoup(bleach.linkify('foo <a href="http://www.räksmörgås.se/">macka</a> bar'), "html5lib").find("a").attrs['href']
http://www.räksmörgås.se/
>>> print BeautifulSoup(bleach.linkify('foo <a href="http://www.räksmörgås.se/">macka</a> bar'), "html5lib").find("a").get_text()
macka
dentarg commented 8 years ago
01:53:24 <..... dentarg> (https://easyengine.io/tutorials/linux/ubuntu-postfix-gmail-smtp/)
01:53:27 <.... rufwebot> Page not found - EasyEngine

:/

but

$ dotenv python offline_tester.py
> (https://easyengine.io/tutorials/linux/ubuntu-postfix-gmail-smtp/)
 telling #testchannel: Configure Postfix to Use Gmail SMTP on Ubuntu - EasyEngine
dentarg commented 8 years ago

server did not have the code somehow

dentarg commented 8 years ago

testing http://www.google.com/foo/bar?