能否实现打开本地html文件并解析的功能？

Boris-code / feapder

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单，功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度

http://feapder.com

Other

2.96k stars 481 forks source link

能否实现打开本地html文件并解析的功能？ #164

Closed iptag closed 1 year ago

iptag commented 2 years ago

大佬的parse解析函数写的很棒，很适合小白分析网页数据并定位节点。但每次都需要向网站发送request，然后根据网站回传的response进行解析，有些网站又做了反爬处理，时不时就弹验证，需要手动处理。实际上，小白写代码时需要不停的获取response（也就是网站的html文件）来编写代码，那就考虑把网站的html保存到本地，然后调用feapder处理，查看了说明文档和网上的实例，都没提及这点，所以想请教大佬该怎么操作？谢谢！

Boris-code commented 2 years ago

feapder shell --help

iptag commented 2 years ago

feapder shell --help

多谢大佬指点。但这个shell -u（--url）命令好像有问题，每次运行都提示错误：

C:\Users\xxx>feapder shell -u https://www.baidu.com
2022-08-31 09:44:48.158 | DEBUG    | feapder.network.request:get_response:line:316 |
                -------------- request for ----------------
                url  = h
                method = None
                body = {'proxies': None, 'timeout': 22, 'stream': True, 'verify': False, 'headers': {'User-Agent': 'Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}}

查了下原因，好像是：post传入的参数不对，可能应该传入列表（list参数），但实际传入的是字符串 https://blog.csdn.net/qxqxqzzz/article/details/90241303 我用的python3.8.10

Boris-code commented 2 years ago

iptag commented 2 years ago

win10，python3.8.10，feapder1.79 oracle-arm，Ubuntu22.04.1，python3.10，feapder1.79

两个环境测试，都提示上面错误。。。。

iptag commented 2 years ago

把feapder\commands\shell.py中的204行 args.url[0]改为args.url 就正常了。。。。

Boris-code commented 1 year ago

已修复