Bystroushaak / pyDHTMLParser

Lightweight HTML/XML parser for quick and dirty web scraping.
MIT License
6 stars 3 forks source link

Problem with _raw_split #27

Closed bedna-KU closed 4 years ago

bedna-KU commented 4 years ago
import urllib3
import dhtmlparser as d

http = urllib3.PoolManager ()
r = http.request ('GET', 'https://linuxos.sk/profil/13656/prispevky/komentare/')

if r.status == 200:
    dom = d.parseString (r.data)

Traceback (most recent call last): File "main.py", line 8, in dom = d.parseString (r.data) File "/home/mario/.local/lib/python3.8/site-packages/dhtmlparser/init.py", line 267, in parseString HTMLElement(x) for x in _raw_split(txt) File "/home/mario/.local/lib/python3.8/site-packages/dhtmlparser/init.py", line 94, in _raw_split content += c TypeError: can only concatenate str (not "int") to str

Bystroushaak commented 4 years ago

r.data returns bytes, parser works on strings, so you have to decode them from utf-8: r.data.decode("utf-8").

bedna-KU commented 4 years ago

OK, thanks.