gawel / pyquery

A jquery-like library for python
http://pyquery.rtfd.org/
Other
2.3k stars 182 forks source link

"doc = pq(driver.page_source)" can not find elements #192

Open iamhmx opened 6 years ago

iamhmx commented 6 years ago

when i use selenium get the "page_source", and find the elements by pyquery, not work; but when i use "doc = pq(url='https://xxxxx')" directly, it works well. codes below: part one:

from pyquery import PyQuery as pq
doc = pq(url='https://search.jd.com/Search?keyword=%E7%A9%BA%E6%B0%94%E5%87%80%E5%8C%96%E5%99%A8&enc=utf-8&suggest=1.def.0.V18&wq=kongqijingh&pvid=60c4120a5787482e8337c64c2fd4184d')
for item in doc('.gl-i-wrap').items():
            price = item('.p-price strong i').text()
            print('price:', price)

works well! part two:

html = self.driver.page_source
doc = pq(html)
for item in doc('.gl-i-wrap').items():
            price = item('.p-price strong i').text()
            print('price:', price)

not work!

Saren-Arterius commented 6 years ago

This issue affects me too. Try print the first 200 characters of page_source, then remove the attribute of <html>. In my case, I have to do this for CSS selectors to work while I am scrapping Facebook WAP.

html = b.page_source.replace('<html xmlns="http://www.w3.org/1999/xhtml">', '<html>')
doc = pq(html)