Neod0Matrix / PixivCrawlerIII

A python3 crawler for crawling Pixiv ranking top and any illustrator all artworks
MIT License
36 stars 9 forks source link

Could not fetch illustrator #4

Open cleoold opened 5 years ago

cleoold commented 5 years ago

When downloading all artworks with command pixivcrawleriii -m 2 -i 1234 a log like this was displayed:

[00:00:13] Mode: [Illustrator Repository All]
[00:00:13] Target crawl illustrator pixiv-id: 1234
[00:00:18] Crawler work directory setting: /storage/emulated/0/TF card/Crawler/PixivCrawlerIII/illustrepo_1234
[00:00:18] Create a new work folder
[00:00:22] Ajaxpage response successed
[00:00:27] Mainpage response successed
[00:00:27] Regex parsing result error, no author info, exit

I have seen this for many users and found no where to succeed. Any info related to this?

Python version: 3.7.3 Platform: termux on hydrogen OS 9.5.7

Neod0Matrix commented 5 years ago

First of all, the ID 1234 user has no work, of course cannot resolve any valid information. Secondly, the Pixiv website seems to have imposed some kind of blocking mechanism on the crawler. I also found out today that I can't crawl pages other than the non-R18 daily ranking. Please be patient and wait for me to re-parse the blocking strategy of the Pixiv website and design the additional crawling mechanism. Thanks for your feedback.

Neod0Matrix commented 5 years ago

I am sorry to tell you now. The R18 page and the artist's personal page crawl failure at the end of June should only be caused by the internal maintenance of the Pixiv website. It is not a new anti-crawl mechanism, and the project can still run normally.