CWHer / PixivCrawler

Pixiv Utils implemented in Python, including Pixiv Crawler and Mosaic Puzzles, support for rankings, personal bookmarks, artist works and keyword search for personalized filtering, and provide high-performance multi-threaded parallel download. 🤗
GNU General Public License v3.0
214 stars 28 forks source link

如何只抓取排行榜的某一类 #1

Closed scp23328 closed 2 years ago

scp23328 commented 2 years ago

例如只抓取排行榜的插画排行榜,而不抓取漫画排行榜?

CWHer commented 2 years ago

可能改一下排行榜的url就能实现,我等今天下班看看🤔

scp23328 commented 2 years ago

另外我在运行程序时总是只能下载一小半就退出程序,我不太确定到底是哪里出了问题

scp23328 commented 2 years ago

image

CWHer commented 2 years ago

另外我在运行程序时总是只能下载一小半就退出程序,我不太确定到底是哪里出了问题

可以把你运行的main.py发一下吗

scp23328 commented 2 years ago
from config import DOWNLOAD_CONFIG
from crawlers.bookmark_crawler import BookmarkCrawler
from crawlers.keyword_crawler import KeywordCrawler
from crawlers.ranking_crawler import RankingCrawler
from crawlers.users_crawler import UserCrawler
from utils import checkDir

if __name__ == "__main__":

    checkDir(DOWNLOAD_CONFIG["STORE_PATH"])

    # case 1: (need cookie !!!)
    #   download artworks from rankings
    #   the only parameter is flow capacity, default is 1024MB
    app = RankingCrawler(capacity=1024)
    app.run()

    # case 2: (need cookie !!!)
    #   download artworks from bookmark
    #   1st parameter is max download number, default is 200
    #   2nd parameter is flow capacity, default is 1024MB
    # app = BookmarkCrawler(n_images=20, capacity=200)
    # app.run()

    # case 3:
    #   download artworks from a single artist
    #   2nd parameter is flow capacity, default is 1024MB
    # app = UserCrawler(artist_id="32548944", capacity=200)
    # app.run()

    # case 4: (need premium & cookie !!!)
    #   download search results of a keyword (sorted by popularity)
    #   1st parameter is keyword
    #   2nd parameter is max download number
    #   3rd parameter is flow capacity
    #app = KeywordCrawler(keyword="百合", n_images=200, capacity=1024*256)
    #app = RankingCrawler(capacity=1024*8)
    #app.run()
CWHer commented 2 years ago

超过流量限制就自动结束了,你可以修改capacity参数调大流量限制,比如

app = RankingCrawler(capacity=1024 * 10)
scp23328 commented 2 years ago

好的,多谢

CWHer commented 2 years ago