code4craft / webmagic

A scalable web crawler framework for Java.
http://webmagic.io/
Apache License 2.0
11.4k stars 4.18k forks source link

xpath语法不支持../父级元素 #753

Open lzm1988 opened 6 years ago

lzm1988 commented 6 years ago
page.getHtml().xpath("//span[@class='zg_selected']/../../ul/li").nodes()

org.jsoup.select.Selector$SelectorParseException: Could not parse query '..': unexpected token at '..'
    at us.codecraft.xsoup.xevaluator.XPathParser.findElements(XPathParser.java:166)
    at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:76)
    at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:408)
    at us.codecraft.xsoup.xevaluator.XPathParser.combinator(XPathParser.java:110)
    at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:74)
    at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:408)
    at us.codecraft.xsoup.Xsoup.compile(Xsoup.java:25)
    at us.codecraft.webmagic.selector.XpathSelector.<init>(XpathSelector.java:21)
    at us.codecraft.webmagic.selector.Selectors.xpath(Selectors.java:32)
    at us.codecraft.webmagic.selector.HtmlNode.xpath(HtmlNode.java:42)
    at com.example.webmagicdemo.amazon.AmazonPageProcessor.process(AmazonPageProcessor.java:29)
    at us.codecraft.webmagic.Spider.onDownloadSuccess(Spider.java:414)
    at us.codecraft.webmagic.Spider.processRequest(Spider.java:406)
    at us.codecraft.webmagic.Spider.access$000(Spider.java:61)
    at us.codecraft.webmagic.Spider$1.run(Spider.java:320)
    at us.codecraft.webmagic.thread.CountableThreadPool$1.run(CountableThreadPool.java:74)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
ChengkaiYang2022 commented 5 years ago

同样问题请问咋解决的

mmm8955405 commented 1 year ago

封装垃圾憋

hooyantsing commented 1 year ago

封装垃圾憋

如果你认为项目有缺陷,请提交你的 PR 继续完善它,而不是在这里无能狂怒,显得很低级