Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.79k stars 273 forks source link

subclasses of ParserBase must override error() #317

Open ccurvey opened 4 years ago

ccurvey commented 4 years ago

it seems that the HTML2Text class is a subclass of _markupbase.ParserBase, which requires subclasses to implement their own error() method. When I am trying to parse some HTML that is throwing an error (I'll have to instrument my code to recover an example), it leads to the following stack trace: (f

File "/opt/webapp/foo/utilities.py" in html_to_text
  74.     text = converter.handle(text)

File "/usr/local/lib/python3.6/dist-packages/html2text/__init__.py" in handle
  129.         self.feed(data)

File "/usr/local/lib/python3.6/dist-packages/html2text/__init__.py" in feed
  126.         super().feed(data)

File "/usr/lib/python3.6/html/parser.py" in feed
  111.         self.goahead(0)

File "/usr/lib/python3.6/html/parser.py" in goahead
  179.                     k = self.parse_html_declaration(i)

File "/usr/lib/python3.6/html/parser.py" in parse_html_declaration
  264.             return self.parse_marked_section(i)

File "/usr/lib/python3.6/_markupbase.py" in parse_marked_section
  149.         sectName, j = self._scan_name( i+3, i )

File "/usr/lib/python3.6/_markupbase.py" in _scan_name
  391.                        % rawdata[declstartpos:declstartpos+20])

File "/usr/lib/python3.6/_markupbase.py" in error
  34.             "subclasses of ParserBase must override error()")

Exception Type: NotImplementedError
Exception Value: subclasses of ParserBase must override error()
$ html2text --version
2019.9.26

I will have to instrument my code to find an example. Coming soon!

$ python --version
Python 3.6.10
gdvalderrama commented 3 years ago

Looks like this is still happening.

Currently using:

Python 3.7.11 html2text 2020.1.16