cobrateam / splinter

splinter - python test framework for web applications
http://splinter.readthedocs.org/en/stable/index.html
BSD 3-Clause "New" or "Revised" License
2.72k stars 509 forks source link

There should be an option to use html5lib instead of lxml.html in DjangoClient (chokes on some html5 input) #441

Open frankier opened 9 years ago

frankier commented 9 years ago

It looks like libxml2's html parsing doesn't produce a proper html5 DOM and sometimes chokes on valid html5 even when run in tolerant mode which can result in errors like "XMLSyntaxError: ... Tag footer invalid". The solution is probably to allow the usage of html5lib instead. One hitch with this is the methods from HTMLMixin no longer exist, so the dependence on these should be removed from Splinter.

andrewsmedina commented 9 years ago

+1 to use html5lib

adamlwgriffiths commented 8 years ago

I've found lxml2's html parser to be unable to handle any real-world HTML. However, I found html5lib has a habit of closing parent tags off early, causing the children become siblings. I personally found the inbuilt Python parser superior to html5lib.