gawel / pyquery

A jquery-like library for python
http://pyquery.rtfd.org/
Other
2.3k stars 182 forks source link

Error in title occurrence when title include symbols such as "<" or ">" #140

Closed lixiuna0908 closed 8 years ago

lixiuna0908 commented 8 years ago

Error in title occurrence When I parsed url which title includes "<" or ">" like this url: https://www.exploit-db.com/exploits/37765/ error title value will be: "'Zend Framework window._wpemojiSettings = {"baseUrl":"https:\/\/s.w.org\/images\/core\/emoji\/72x72\/","ext":".png","source"......" But normal title should be "Zend Framework <= 2.4.2 - XML eXternal Entity Injection XXE on PHP FPM"

Could you have any ideas for this issue? Thank you

gawel commented 8 years ago

Cant help without traceback

lixiuna0908 commented 8 years ago

When i execute: pyq = PyQuery("<title>Zend Framework <= 2.4.2 - XML eXternal Entity Injection XXE on PHP FPM</title>") result like this: <html><head><title>Zend Framework </title></head></html>

However the real title is: <title>Zend Framework <= 2.4.2 - XML eXternal Entity Injection XXE on PHP FPM</title> Do you have any suggestions which can get real tilte? Thank you very much.

lixiuna0908 commented 8 years ago

And symbol < in title string is not encode into &lt; in this case, what should we do can get real tilte? Thank you.

gawel commented 8 years ago

No idea. That's maybe a know problem in lxml (pretty sure that the bug is not in pyquery)