gawel / pyquery

A jquery-like library for python
http://pyquery.rtfd.org/
Other
2.3k stars 182 forks source link

:first not work in some situation #116

Open eromoe opened 8 years ago

eromoe commented 8 years ago
from lxml import html
from pyquery import PyQuery as pq
import requests

url = 'http://bbs.enorth.com.cn/thread-5402974-1-1.html'
r =requests.get(url)

h = html.fromstring(r.text)

doc = pq(h)

print doc('.pls.favatar .xw1:first').text()

print h.cssselect('.pls.favatar .xw1')[0].text_content()

output:

sammerliu 兴凯湖水 高蛤米 dabkde ssk001_2
sammerliu

pyquery ignore :first and take all .xw1

kissgyorgy commented 8 years ago

Hmm, I wanted to open a new issue, but mine might be the same:

>>> print "LINK HTML", repr(d('.portlet:first a').html())
>>> print "LINK TEXT", repr(d('.portlet:first a').text())
>>> print "FIND HTML", repr(d('.portlet:first').find('a').html())
>>> print "FIND TEXT", repr(d('.portlet:first').find('a').text())
LINK HTML '\n                  \n                  First Division\n                  '
LINK TEXT 'First Division Second Division'
FIND HTML '\n                  \n                  First Division\n                  '
FIND TEXT 'First Division Second Division'