gawel / pyquery

A jquery-like library for python
http://pyquery.rtfd.org/
Other
2.29k stars 182 forks source link

.html() fails to escape initial html entities #205

Closed jcushman closed 3 years ago

jcushman commented 4 years ago

The html() method returns incorrect results in some cases because it fails to escape HTML entities prior to the first tag in the inner HTML:

>>> PyQuery("<foo>&lt;script&gt;// uh oh&lt;/script&gt;bar<boo/></foo>").html()
'<script>// uh oh</script>bar<boo/>'

This has potential security implications for downstream users if processing sanitized user-controlled content.

The fix would be to html-encode tag.text in the html method.

gawel commented 4 years ago

Will this be enough if you have childrens ? (also feel free to provide a PR :) )