gawel / pyquery

A jquery-like library for python
http://pyquery.rtfd.org/
Other
2.3k stars 182 forks source link

Incorrect HTML for empty <script> tags #90

Open asouchang opened 9 years ago

asouchang commented 9 years ago

The following python code generates incorrect HTML code for empty script tags.

print PyQuery('<script></script>')

The expected output is exactly the input string. However, the actual output is

<script/>

Testing Environment: OS: Mac OS X 10.10 Python: 2.7.6 (shipped with OS) PyQuery: 1.2.9 (installed from pip)

Lhfcws commented 9 years ago

I've got the same problem.

I tested the latest version on github. It is still failed.

Actually, I think it is due to lxml.

I tried

    PyQuery("<div></div>")    # output: <div/>

For user, it may be a bug.

A trick is that you can insert some useless js into the script tag to make it works correctly, like

    <script> var __pyquery = 0;</script>
twz915 commented 9 years ago
doc = PyQuery('<script></script>')
print doc.outer_html()

works well

mrnfrancesco commented 9 years ago

Even if this bug/conversation seems dead:

PyQuery('<script></script>').html() should not be equal to <script></script>, but it should (and it is) an empty string. This is because html() look for child html.

If you write something like PyQuery('<div><script></script></div>').html() you should have <script></script> as response, but PyQuery will give you <script/>.

This is not a bug, due to help(PyQuery.html):

...
>>> d = PyQuery('<div><span></span></div>')
>>> print(d.html())
<span/>
>>> print(d.html(method='html'))
<span></span>
...