code4craft / webmagic

A scalable web crawler framework for Java.
http://webmagic.io/
Apache License 2.0
11.4k stars 4.18k forks source link

xpath 支持问题:positon() --->//table[@id='list_table']/tbody/tr[position()>1] #995

Open shizhier opened 3 years ago

shizhier commented 3 years ago

html.xpath("//table[@id='list_table']/tbody/tr[position()>1]"); org.jsoup.select.Selector$SelectorParseException: Could not parse query 'tr[position()>1]': unexpected token at 'position()>1' at us.codecraft.xsoup.xevaluator.XPathParser.byFunction(XPathParser.java:260) at us.codecraft.xsoup.xevaluator.XPathParser.consumePredicates(XPathParser.java:231) at us.codecraft.xsoup.xevaluator.XPathParser.findElements(XPathParser.java:163) at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:76) at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:408) at us.codecraft.xsoup.xevaluator.XPathParser.combinator(XPathParser.java:110) at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:74) at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:408) at us.codecraft.xsoup.Xsoup.compile(Xsoup.java:25) at us.codecraft.webmagic.selector.XpathSelector.(XpathSelector.java:21) at us.codecraft.webmagic.selector.Selectors.xpath(Selectors.java:32) at us.codecraft.webmagic.selector.HtmlNode.xpath(HtmlNode.java:42) at com.datage.dms.spider.spider.SichuanConstructionIndustryDataSharePlatform.process(SichuanConstructionIndustryDataSharePlatform.java:42)

yuweiming2016 commented 3 years ago

xpath支持得不是很好,作者自己写的,用正则或者css吧

sutra commented 3 years ago

试试这个: https://github.com/code4craft/webmagic/issues/984#issuecomment-760695694

shizhier commented 3 years ago

webmagic是个很好的开源项目。基于java的生态,如果能长远的发展下去,很不错。慢慢成长。

Sutra Zhou notifications@github.com 于2021年2月5日周五 下午3:03写道:

试试这个: #984 (comment) https://github.com/code4craft/webmagic/issues/984#issuecomment-760695694

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/code4craft/webmagic/issues/995#issuecomment-773838674, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBVCOYILLJDHZGZOHINF5LS5OJ4DANCNFSM4XCPLHCQ .