TurboWay / spiderman

基于 scrapy-redis 的通用分布式爬虫框架
MIT License
591 stars 128 forks source link

关于Splash使用的问题 #43

Closed donneyluck closed 1 year ago

donneyluck commented 1 year ago

起因: 之前steam爬虫 会被年龄验证页面卡住 过程: 查询大量资料后 发现用splash渲染低频页面 并且跑一个lua脚本可以解决 image

但是我把代码尝试移动到框架中时却发现无效且有警告 2023-02-17 17:27:27 [py.warnings] WARNING: /home/donney/.local/lib/python3.10/site-packages/scrapy_redis/dupefilter.py:115: ScrapyDeprecationWarning: Call to deprecated function scrapy.utils.request.request_fingerprint().

If you are using this function in a Scrapy component, and you are OK with users of your component changing the fingerprinting algorithm through settings, use crawler.request_fingerprinter.fingerprint() instead in your Scrapy component (you can get the crawler object from the 'from_crawler' class method).

Otherwise, consider using the scrapy.utils.request.fingerprint() function instead.

Either way, the resulting fingerprints will be returned as bytes, not as a string, and they will also be different from those generated by 'request_fingerprint()'. Before you switch, make sure that you understand the consequences of this (e.g. cache invalidation) and are OK with them; otherwise, consider implementing your own function which returns the same fingerprints as the deprecated 'request_fingerprint()' function. return request_fingerprint(request)