chenjiandongx / mmjpg

👩 美女写真套图爬虫（一）

MIT License

498 stars 247 forks source link

一直Please start your performance! #1

Closed ShowerXu closed 7 years ago

ShowerXu commented 7 years ago

win7+ python2.7.13,安装好所需模块，但是一直Please start your performance!不下载我创建了e:/mmjpg也会删除，但就不创建了

应该是多线程的问题，我的是N2820的下载机，双核不支持多线程，关掉脚本cpu满载不下，直到重启

ShowerXu commented 7 years ago

实际上，在我i3上面也是类似现象，除了cpu不到100%

chenjiandongx commented 7 years ago

我使用的 win10 + Python 3.5.2,

这个爬虫使用的是多进程，不是多线程，进程数取决于你的 cpu 数
试过用多线程，不过效果没有多进程的好，多线程由于 GIL 的原因，不太适用于这种爬虫下载

multiprocessing 模块官方介绍 https://docs.python.org/2/library/multiprocessing.html#introduction New in version 2.6. 所以你的 Python 2.7 应该是没问题的

cpu 是不会开到 100% 的，还有 e:/mmjpg 其实也不用自己创建的
或者你试试先把 pool = Pool(processes=cpu_count()) 的processes 改为 1，试试单进程能不能跑起来
再不行的话就改用 multiprocessing.Process( ) 创建进程吧，这样就要修改点代码了

ShowerXu commented 7 years ago

processes=1也是一样，文件夹都不创建就有意思了

chenjiandongx commented 7 years ago

这问题我也搞不太清楚，不然你试试用 python3 吧，因为毕竟我是在 python3 下测试的。
要不然你就修改多进程那部分代码吧，改用别的进程模块

ShowerXu commented 7 years ago

用了最新的3.6一样不行，文件夹也不创建，一直Please start your performance! 脚本没详细的打印信息，也不知道一步挂了

chenjiandongx commented 7 years ago

如果能打印 Please start your performance! 但不执行接下来的操作那就是 urls_crawler(url) 方法的问题了，要不你试试在这个方法中的代码中间加入打印语句，测试看看具体到哪一句就打印不出来不执行了，因为你这样说我也没办法确定问题所在

ShowerXu commented 7 years ago

我加了打印标记发现，不能执行下面一句，应该还是创建进程时的问题 results = pool.map(urls_crawler, urls) 不使用进程池 urls_crawler(urls[1])

results = pool.map(urls_crawler, urls)

发现能成功下载

chenjiandongx commented 7 years ago

试试其他进程的写法吧，换种思路

try:
    process = []
    delete_empty_dir(dir_path)
    # results = pool.map(urls_crawler, urls)
    for i in range(cpu_count()):
        p = multiprocessing.Process(target=urls_crawler, args=(urls,))  # 创建进程
        p.start()           # 启动进程
        process.append(p)  # 进程入队

    for p in process:
        p.join()  # 等待进程结束

然后把 urls_crawler(urls) 方法改为

def urls_crawler(urls):
    """ 爬虫入口，主要爬取操作 """
    for url in urls:
        try:

ShowerXu commented 7 years ago

谢谢，这个方法可行

chenjiandongx commented 7 years ago

问题解决那我关闭这个 issue 了