CWHer / PixivCrawler

Pixiv Utils implemented in Python, including Pixiv Crawler and Mosaic Puzzles, support for rankings, personal bookmarks, artist works and keyword search for personalized filtering, and provide high-performance multi-threaded parallel download. 🤗
GNU General Public License v3.0
214 stars 28 forks source link

图片下载成功,但出现encoding error #6

Closed JackLiu7810 closed 1 year ago

JackLiu7810 commented 1 year ago
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
CWHer commented 1 year ago

如何复现?

JackLiu7810 commented 1 year ago

环境

MacBookPro2021 Python 3.11.1 我使用的是pycharm运行(小白不知道怎么用终端QWQ 使用clash代理

main函数

(注:我收藏只有二十几张,150M多点,不会到限制流量200M

from config import DOWNLOAD_CONFIG
from crawlers.bookmark_crawler import BookmarkCrawler
from crawlers.keyword_crawler import KeywordCrawler
from crawlers.ranking_crawler import RankingCrawler
from crawlers.users_crawler import UserCrawler
from utils import checkDir

if __name__ == "__main__":

    checkDir(DOWNLOAD_CONFIG["STORE_PATH"])

    # case 1: (need cookie !!!)
    #   download artworks from rankings
    #   the only parameter is flow capacity, default is 1024MB
    # 下载排行内容
    #   capacity参数用于限制下载流量
    # app = RankingCrawler(capacity=200)
    # app.run()

    # case 2: (need cookie !!!)
    #   download artworks from bookmark
    #   1st parameter is max download number, default is 200
    #   2nd parameter is flow capacity, default is 1024MB
    # 下载user_id的个人公开收藏
    app = BookmarkCrawler(n_images=30, capacity=200)
    app.run()

    # case 3: (need cookie for R18 images !!!)
    #   download artworks from a single artist
    #   2nd parameter is flow capacity, default is 1024MB
    # 下载某位画师的作品
    # app = UserCrawler(artist_id="49552835", capacity=1024)
    # app.run()

    # case 4: (need premium & cookie !!!)
    #   download search results of a keyword (sorted by popularity if order=True)
    #   support advanced search, e.g. "(Lucy OR 边缘行者) AND (5000users OR 10000users)"
    #       refer to https://www.pixiv.help/hc/en-us/articles/235646387-I-would-like-to-know-how-to-search-for-content-on-pixiv-
    #   1st parameter is keyword
    #   2nd parameter is order (default is False, standing for order by date, True for order by popularity)
    #   3rd parameter is mode (support ["safe", "r18", "all"], default is "safe")
    #   4th parameter is max download number
    #   5th parameter is flow capacity
    # 下载某个关键词的作品
    # 注:按照热门度排序需要premium账户
    # 正确配置USER_CONFIG,修改主程序
    # keyword参数为关键词
    # n_images参数用于限制最大下载数量
    # app = KeywordCrawler(keyword="(Lucy OR 边缘行者) AND (5000users OR 10000users)",
    #                      order=False, mode=["safe", "r18", "all"][-1], n_images=20, capacity=200)
    # app.run()

config

import datetime

# NOTE: MODE_CONFIG only applies to ranking crawler
MODE_CONFIG = {
    # start date
    "START_DATE": datetime.date(2023, 2, 4),
    # date range: [start, start + range - 1]
    "RANGE": 1,

    # which ranking list
    "RANKING_MODES": [
        "daily", "weekly", "monthly",
        "male", "female",
        "daily_r18", "weekly_r18",
        "male_r18", "female_r18"
    ],
    "MODE": "daily",  # choose from the above

    # illustration, manga, or both
    "CONTENT_MODES": [
        "all",  # download both illustrations & mangas
        "illust" , "manga"
    ],
    "CONTENT_MODE": "illust",  # choose from the above

    # download top x in each ranking
    #   suggested x be a multiple of 50
    "N_ARTWORK": 50
}

OUTPUT_CONFIG = {
    # verbose / simplified output
    "VERBOSE": False,
    "PRINT_ERROR": False
}

NETWORK_CONFIG = {
    # proxy setting
    #   you should customize your proxy setting accordingly
    #   default is for clash
    "PROXY": {"https": "127.0.0.1:7890"},

    # common request header
    "HEADER": {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
    }
}

USER_CONFIG = {
    # user id
    #   access your pixiv user profile to find this
    #   e.g. https://www.pixiv.net/users/xxxx
    "USER_ID": "65143612",

    "COOKIE": ____

}

DOWNLOAD_CONFIG = {
    # image save path
    #   NOTE: DO NOT miss "/"
    "STORE_PATH": ____

    # abort request / download
    #   after 10 unsuccessful attempts
    "N_TIMES": 10,

    # need tag ?
    "WITH_TAG": True,

    # waiting time (s) after failure
    "FAIL_DELAY": 1,

    # max parallel thread number
    "N_THREAD": 12,
    # waiting time (s) after thread start
    "THREAD_DELAY": 1,
}

返回

____/PixivCrawler-master/pixiv_crawler/main.py 
[INFO]: ===== requesting bookmark count =====
[INFO]: select 22/22 artworks
[INFO]: ===== request bookmark count complete =====
[INFO]: ===== start collecting 65143612's bookmarks =====
collecting ids: 100%|██████████| 1/1 [00:02<00:00,  2.97s/it]
collecting tags:   0%|          | 0/22 [00:00<?, ?it/s][INFO]: ===== collect bookmark complete =====
[INFO]: downloadable artworks: 22
[INFO]: ===== tag collector start =====
collecting tags:  14%|█▎        | 3/22 [00:05<00:28,  1.50s/it]encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
collecting tags:  23%|██▎       | 5/22 [00:10<00:34,  2.00s/it]encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
encoding error : input conversion failed due to input error, bytes 0x21 0x00 0x00 0x00
encoding error : input conversion failed due to input error, bytes 0x44 0x00 0x00 0x00
I/O error : encoder error
collecting tags: 100%|██████████| 22/22 [00:40<00:00,  1.83s/it]
collecting urls:   0%|          | 0/22 [00:00<?, ?it/s][INFO]: ===== tag collector complete =====
[INFO]: ===== collector start =====
collecting urls: 100%|██████████| 22/22 [00:08<00:00,  2.48it/s]
downloading:   0%|          | 0/28 [00:00<?, ?it/s][INFO]: ===== collector complete =====
[INFO]: total images: 28
[INFO]: ===== downloader start =====
downloading / flow 138.10MB: 100%|██████████| 28/28 [06:13<00:00, 13.35s/it]
[INFO]: ===== downloader complete =====

进程已结束,退出代码0
JackLiu7810 commented 1 year ago

抱歉,我可能使用了版本更新的依赖包,明天我把依赖降级之后再试试哈

CWHer commented 1 year ago

我可以成功复现,这个错误似乎是在下载tag的时候发生的,我这边不会影响图片下载:thinking:

最快的解决方案是不下载tag,我再看看啥情况

DOWNLOAD_CONFIG = {
    # need tag ?
    "WITH_TAG": False,
}
CWHer commented 1 year ago

明天我把依赖降级之后再试试哈

我试了一下,应该是pyquery版本太旧了:dizzy_face:。

我把requirements.txt更新了一下,重新执行pip install requirements.txt -r应该就没问题了