Open code4craft opened 10 years ago
Write spider by config file or scripts.
<spider> <site> <charset>utf-8</charset> <user-agent></user-agent> <cookies> <cookie domain="" path="" name="" value=""> </cookie> </cookies> <heads> <head name="" value=""/> </heads> </site> <startUrls> <url></url> </startUrls> <extraction targetUrl="" helpUrl=""> <field name="title"> <extractor type="xpath" value="//div[@class='title']"/> </field> <field name="content"> <extractor type="xpath" value="//div[@class='content']"/> </field> </extraction> </spider>
var name=xpath("//h1[@class='entry-title public']/strong/a/text()") var readme=xpath("//div[@id='readme']/tidyText()") var star=xpath("//ul[@class='pagehead-actions']/li[1]//a[@class='social-count js-social-count']/text()")
name= xpath "//h1[@class='entry-title public']/strong/a/text()" readme = xpath "//div[@id='readme']/tidyText()" star = xpath "//ul[@class='pagehead-actions']/li[1]//a[@class='social-count js-social-count']/text()" fork = xpath "//ul[@class='pagehead-actions']/li[2]//a[@class='social-count']/text()"
Just write PageProcessor and load it dynamicly…
这方面是否可以考虑groovy或者scala?
还是Groovy好一些。
Write spider by config file or scripts.
Choices:
1. xml
2. json
3. yaml
4.javascript
5.jruby
6. Java
Just write PageProcessor and load it dynamicly…
7. Groovy
8. Scala