4teamwork / ftw.testbrowser

A testing browser for Plone.
http://ftwtestbrowser.readthedocs.io/
5 stars 1 forks source link

Respect HTTP response encoding #88

Closed jone closed 6 years ago

jone commented 6 years ago

This pull request should improve the encoding situation in testbrowser a bit.

The problem is that we use lxml to parse HTML. XML must have an encoding node, therefore lxml expects that node and reads the encoding from there. But HTML does not necesserily have an encoding node, especially when only a partial is sent, not a full HTML document.

lxml.html sometimes messes up the encoding when no node declares the encoding. But it is hard to tell it the correct encoding, which we usually know from the content-type HTTP response header.

Changes:

  1. Let the parser used for lxml.html know the encoding from the content-type header. This makes the situation better when the document is encoded in utf-8. However, it does not really work with encodings such as ISO-8859-15/latin-9 (default Zope encoding).

2. Therefore the testbrowser now prints a warning whenever a response is not utf-8. The warning can be disabled with an environment variable when it is too annoying.

lukasgraf commented 6 years ago

I don't think lxml is to blame here.

With this change

class TestPartialView(BrowserView):

    def __call__(self):
        self.request.response.setHeader('X-Theme-Disabled', 'True')

the tests from the first commit (aa98b43) pass for me for plone-5.1. The response is already messed up (containing incorrectly encoded data), it's Diazo that's messing up the partial template.

jone commented 6 years ago

@lukasgraf I've updated the pull request and removed the warning. The only change is now that the testbrowser passes the encoding from the content-type response header to the parser.