fedora-infra / pkgwat.api

Python API for querying the Fedora Packages webapp
http://pkgwat.rtfd.org
Other
9 stars 9 forks source link

Handle erroneous html_parser exception on py2.6. #16

Closed ralphbean closed 10 years ago

ralphbean commented 10 years ago

This should fix #15 and consequently fedora-infra/fedora-tagger#107.

ralphbean commented 10 years ago

(It raises the exception that we catch when it runs into an email address surrounded in <> delimiters.)

pypingou commented 10 years ago

It looks fine but seems to be doing two different things, won't that create two different behavior?

ralphbean commented 10 years ago

It looks fine but seems to be doing two different things, won't that create two different behavior?

It will. If it succeeds without exception, it will strip any html tags from the text. If it raises an exception, the html tags will still be included (but I'd rather have that than a traceback).

pypingou commented 10 years ago

:+1: then :) but I wonder if a regex based approach or similar would not have been more robust here

ralphbean commented 10 years ago

I wonder if a regex based approach or similar would not have been more robust here

I shun regex + xml unthinkingly ever since I found this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 :ghost: