Parsely / schemato

Modularly extensible semantic metadata validator
http://schema.to
Apache License 2.0
83 stars 9 forks source link

HTMLParseError was deprecated in Py3 #19

Open ghost opened 8 years ago

ghost commented 8 years ago

There's a line attempting to import HTMLParseError here which results in schemato being unimportable in Python3:

In [1]: import schemato
INFO:rdflib:RDFLib Version: 4.2.1
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-3210cda68f48> in <module>()
----> 1 import schemato

/usr/local/lib/python3.5/dist-packages/schemato/__init__.py in <module>()
      1 from __future__ import absolute_import
      2 
----> 3 from .schemato import Schemato
      4 
      5 

/usr/local/lib/python3.5/dist-packages/schemato/schemato.py in <module>()
      8 
      9 from .compound_graph import CompoundGraph
---> 10 from .schemas.parselypage import ParselyPageValidator
     11 
     12 

/usr/local/lib/python3.5/dist-packages/schemato/schemas/parselypage.py in <module>()
      5 from six import StringIO, iteritems
      6 from six.moves.urllib.request import urlopen
----> 7 from six.moves.html_parser import HTMLParser, HTMLParseError
      8 
      9 from ..errors import _error

ImportError: cannot import name 'HTMLParseError'

This occurs because while six.moves makes html_parser available, the Python3 target no longer supports HTMLParseError.

It appears only to be raised from one function; probably a custom exception would suffice.

It's a small change but if you're short on time I'm happy to offer a PR?

ghost commented 8 years ago

Suggested change, which appears to work for me, is to use six to check for python version; if 2, import the old error. If 3, define a new exception. Client code is suggested to check for the schemato.schemas.parselypage.HTMLParseError in either case, rather than the html_parser error..but if they do in Py2, it's no big deal. In Py3 they'd never have written dependent code, because it's already broken.

dan-blanchard commented 8 years ago

Looks like this was only removed in the 3.5 release when they merged the patch here. You can see other people have worked around this by defining a custom class like in stephenmcd/mezzanine#1414. A PR would be much appreciated.