WHB1 / WHB1.github.io.old

blog
0 stars 0 forks source link

开源可视化网页抓取工具Portia 爬虫 #24

Open WHB1 opened 6 years ago

WHB1 commented 6 years ago

http://v.youku.com/v_show/id_XNjkzNjkwODE2.html https://github.com/scrapinghub/portia

WHB1 commented 6 years ago

xPath

网址
http://www.w3school.com.cn/xpath/index.asp

http://www.w3school.com.cn/xmldom/dom_xpathresult.asp

https://github.com/search?l=JavaScript&o=desc&q=js+xpath&s=stars&type=Repositories&utf8=%E2%9C%93

对xpath理解
document.evaluate的详细用法--使用XPath查找某些节点对象
http://www.blogjava.net/baoyaer/articles/187448.html

Js获取元素的xpath
http://blog.csdn.net/u010085423/article/details/54628799

document.addEventListener("click",function(e){console.log(e.target.id)})

已经制作好的【可以参考的】
http://xuriyunhai.iteye.com/blog/1169505

WHB1 commented 6 years ago

Js根据xpath获取元素

function _x(STR_XPATH) { var xresult = document.evaluate(STR_XPATH, document, null, XPathResult.ANY_TYPE, null); var xnodes = []; var xres; while (xres = xresult.iterateNext()) { xnodes.push(xres); }

return xnodes;

}

WHB1 commented 6 years ago

了解

Scrapy爬虫中使用Splash处理页面JS
http://ae.yyuap.com/pages/viewpage.action?pageId=919763

scrapy爬虫学习系列四:portia的学习入门 http://blog.csdn.net/zhanglao33/article/details/77678806