CWHer / PixivCrawler

Pixiv Utils implemented in Python, including Pixiv Crawler and Mosaic Puzzles, support for rankings, personal bookmarks, artist works and keyword search for personalized filtering, and provide high-performance multi-threaded parallel download. 🤗
GNU General Public License v3.0
214 stars 28 forks source link

修复tag collector出现encoding error报错的问题 #10

Closed OPlincn closed 3 months ago

OPlincn commented 3 months ago

如果启用了WITH_TAG, 那么collect tag的过程中就会报错: encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00 encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00 I/O error : encoder error

image

找了半天,发现pixiv_crawler/collector/selectors.py中的 python PyQuery(response.text).find( "#meta-preload-data").attr("content"))

这段代码出现了问题, PyQuery不知道为什么解析一些tag的时候会出现 encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00这样的编码错误,

而如果将PyQuery换成BeautifulSoup来解析html文件,这个问题就解决了

并且我发现PyQuery这个库,只有这个地方用到了,所以我将其从requirements.txt中删除了,并加入安装了bs4库. 我看issue里有多个人也出现了这个问题, 换个库来解析这样应该能够彻底解决这个问题(?) 更换后的效果如下:

image

我的测试环境为: Python 3.10.14 MacOS 14.5 (Sonoma) M1 Silicon

CWHer commented 3 months ago

感谢提交PR,我会这两天尽快看一下 👍

OPlincn commented 3 months ago

好的好的,麻烦了(^▽^)