Closed JackLiu7810 closed 1 year ago
如何复现?
MacBookPro2021 Python 3.11.1 我使用的是pycharm运行(小白不知道怎么用终端QWQ 使用clash代理
(注:我收藏只有二十几张,150M多点,不会到限制流量200M
from config import DOWNLOAD_CONFIG
from crawlers.bookmark_crawler import BookmarkCrawler
from crawlers.keyword_crawler import KeywordCrawler
from crawlers.ranking_crawler import RankingCrawler
from crawlers.users_crawler import UserCrawler
from utils import checkDir
if __name__ == "__main__":
checkDir(DOWNLOAD_CONFIG["STORE_PATH"])
# case 1: (need cookie !!!)
# download artworks from rankings
# the only parameter is flow capacity, default is 1024MB
# 下载排行内容
# capacity参数用于限制下载流量
# app = RankingCrawler(capacity=200)
# app.run()
# case 2: (need cookie !!!)
# download artworks from bookmark
# 1st parameter is max download number, default is 200
# 2nd parameter is flow capacity, default is 1024MB
# 下载user_id的个人公开收藏
app = BookmarkCrawler(n_images=30, capacity=200)
app.run()
# case 3: (need cookie for R18 images !!!)
# download artworks from a single artist
# 2nd parameter is flow capacity, default is 1024MB
# 下载某位画师的作品
# app = UserCrawler(artist_id="49552835", capacity=1024)
# app.run()
# case 4: (need premium & cookie !!!)
# download search results of a keyword (sorted by popularity if order=True)
# support advanced search, e.g. "(Lucy OR 边缘行者) AND (5000users OR 10000users)"
# refer to https://www.pixiv.help/hc/en-us/articles/235646387-I-would-like-to-know-how-to-search-for-content-on-pixiv-
# 1st parameter is keyword
# 2nd parameter is order (default is False, standing for order by date, True for order by popularity)
# 3rd parameter is mode (support ["safe", "r18", "all"], default is "safe")
# 4th parameter is max download number
# 5th parameter is flow capacity
# 下载某个关键词的作品
# 注:按照热门度排序需要premium账户
# 正确配置USER_CONFIG,修改主程序
# keyword参数为关键词
# n_images参数用于限制最大下载数量
# app = KeywordCrawler(keyword="(Lucy OR 边缘行者) AND (5000users OR 10000users)",
# order=False, mode=["safe", "r18", "all"][-1], n_images=20, capacity=200)
# app.run()
import datetime
# NOTE: MODE_CONFIG only applies to ranking crawler
MODE_CONFIG = {
# start date
"START_DATE": datetime.date(2023, 2, 4),
# date range: [start, start + range - 1]
"RANGE": 1,
# which ranking list
"RANKING_MODES": [
"daily", "weekly", "monthly",
"male", "female",
"daily_r18", "weekly_r18",
"male_r18", "female_r18"
],
"MODE": "daily", # choose from the above
# illustration, manga, or both
"CONTENT_MODES": [
"all", # download both illustrations & mangas
"illust" , "manga"
],
"CONTENT_MODE": "illust", # choose from the above
# download top x in each ranking
# suggested x be a multiple of 50
"N_ARTWORK": 50
}
OUTPUT_CONFIG = {
# verbose / simplified output
"VERBOSE": False,
"PRINT_ERROR": False
}
NETWORK_CONFIG = {
# proxy setting
# you should customize your proxy setting accordingly
# default is for clash
"PROXY": {"https": "127.0.0.1:7890"},
# common request header
"HEADER": {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
}
}
USER_CONFIG = {
# user id
# access your pixiv user profile to find this
# e.g. https://www.pixiv.net/users/xxxx
"USER_ID": "65143612",
"COOKIE": ____
}
DOWNLOAD_CONFIG = {
# image save path
# NOTE: DO NOT miss "/"
"STORE_PATH": ____
# abort request / download
# after 10 unsuccessful attempts
"N_TIMES": 10,
# need tag ?
"WITH_TAG": True,
# waiting time (s) after failure
"FAIL_DELAY": 1,
# max parallel thread number
"N_THREAD": 12,
# waiting time (s) after thread start
"THREAD_DELAY": 1,
}
____/PixivCrawler-master/pixiv_crawler/main.py
[INFO]: ===== requesting bookmark count =====
[INFO]: select 22/22 artworks
[INFO]: ===== request bookmark count complete =====
[INFO]: ===== start collecting 65143612's bookmarks =====
collecting ids: 100%|██████████| 1/1 [00:02<00:00, 2.97s/it]
collecting tags: 0%| | 0/22 [00:00<?, ?it/s][INFO]: ===== collect bookmark complete =====
[INFO]: downloadable artworks: 22
[INFO]: ===== tag collector start =====
collecting tags: 14%|█▎ | 3/22 [00:05<00:28, 1.50s/it]encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
collecting tags: 23%|██▎ | 5/22 [00:10<00:34, 2.00s/it]encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
collecting tags: 100%|██████████| 22/22 [00:40<00:00, 1.83s/it]
collecting urls: 0%| | 0/22 [00:00<?, ?it/s][INFO]: ===== tag collector complete =====
[INFO]: ===== collector start =====
collecting urls: 100%|██████████| 22/22 [00:08<00:00, 2.48it/s]
downloading: 0%| | 0/28 [00:00<?, ?it/s][INFO]: ===== collector complete =====
[INFO]: total images: 28
[INFO]: ===== downloader start =====
downloading / flow 138.10MB: 100%|██████████| 28/28 [06:13<00:00, 13.35s/it]
[INFO]: ===== downloader complete =====
进程已结束,退出代码0
抱歉,我可能使用了版本更新的依赖包,明天我把依赖降级之后再试试哈
我可以成功复现,这个错误似乎是在下载tag的时候发生的,我这边不会影响图片下载:thinking:
最快的解决方案是不下载tag,我再看看啥情况
DOWNLOAD_CONFIG = {
# need tag ?
"WITH_TAG": False,
}
明天我把依赖降级之后再试试哈
我试了一下,应该是pyquery
版本太旧了:dizzy_face:。
我把requirements.txt
更新了一下,重新执行pip install requirements.txt -r
应该就没问题了