Nandaka / PixivUtil2

Download images from Pixiv and more!
http://nandaka.devnull.zone/
BSD 2-Clause "Simplified" License
2.34k stars 257 forks source link

HTTP Error 403: request disallowed by robots.txt #10

Closed Owyn closed 12 years ago

Owyn commented 12 years ago

Can't download anything T_T

PixivDownloader2 version 20120806 https://nandaka.wordpress.com/tag/pixiv-downloader/ Reading V:\Program Files\PixivD\config.ini ... done. Creating database... done. Only process member where day last updated >= 7 Using Username: test56 logging in with saved cookie Trying to log with saved cookie Cookie already expired/invalid. Log in using form. done. new cookie value: 22d476541f275bad092a260a60f9f6f8 Writing config file... done. PixivDownloader2 version 20120806 https://nandaka.wordpress.com/tag/pixiv-downloader/

  1. Download by member_id
  2. Download by image_id
  3. Download by tags
  4. Download from list
  5. Download from online user bookmark
  6. Download from online image bookmark
  7. Download from tags list
  8. Download new illust from bookmark
  9. Download by Title/Caption

    10. Download by Tag and Member Id

d. Manage database e. Export online bookmark x. Exit Input: 1 Member id: 1471757 Start Page (default=1): End Page (default=0, 0 for no limit): Processing Member Id: 1471757 Reading V:\Program Files\PixivD\config.ini ... done. Page 1 Member Name : ?????????????? Member Avatar: http://i2.pixiv.net/img44/profile/believer_a/4859407.png Member Token : believer_a

1

Processing Image Id: 29126463 Title: ?????????? Tags : ?????? ??????????? ?????????????? ???? ????????? ???? ??????? Mode : big Image URL : http://i2.pixiv.net/img44/img/believer_a/29126463.png Filename : C:\DL Image Packs\1471757 (believer_a)\29126463.png HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 Traceback (most recent call last): File "PixivUtil2.py", line 672, in processImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 122, in downloadImage File "mechanize_mechanize.pyc", line 203, in open File "mechanize_mechanize.pyc", line 255, in _mech_open httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Error at processImage(): (<class 'mechanize._response.httperror_seek_wrapper'>, <httperror_seek_wrapper (mechanize._http.RobotExclusionError instance) at 0x1649 c38 whose wrapped object = <closeable_response at 0x16e2940 whose fp = <cStringI O.StringI object at 0x016FE908>>>, <traceback object at 0x016FC378>) Dumping html to: Error Medium Page for image 29126463.html Cannot dump page for image_id: 29126463 Stuff happened, trying again after 2 second ( 1 ) local variable 'parseBigImage' referenced before assignment Processing Image Id: 29126463 Title: ?????????? Tags : ?????? ??????????? ?????????????? ???? ????????? ???? ??????? Mode : big Image URL : http://i2.pixiv.net/img44/img/believer_a/29126463.png Filename : C:\DL Image Packs\1471757 (believer_a)\29126463.png HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 Traceback (most recent call last): File "PixivUtil2.py", line 672, in processImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 122, in downloadImage File "mechanize_mechanize.pyc", line 203, in open File "mechanize_mechanize.pyc", line 255, in _mech_open httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Error at processImage(): (<class 'mechanize._response.httperror_seek_wrapper'>, <httperror_seek_wrapper (mechanize._http.RobotExclusionError instance) at 0x1720 d88 whose wrapped object = <closeable_response at 0x17d2a30 whose fp = <cStringI O.StringI object at 0x017E5F98>>>, <traceback object at 0x017D7AD0>) Dumping html to: Error Medium Page for image 29126463.html Cannot dump page for image_id: 29126463 Stuff happened, trying again after 2 second ( 2 ) local variable 'parseBigImage' referenced before assignment Processing Image Id: 29126463 Title: ?????????? Tags : ?????? ??????????? ?????????????? ???? ????????? ???? ??????? Mode : big Image URL : http://i2.pixiv.net/img44/img/believer_a/29126463.png Filename : C:\DL Image Packs\1471757 (believer_a)\29126463.png HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 Traceback (most recent call last): File "PixivUtil2.py", line 672, in processImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 122, in downloadImage File "mechanize_mechanize.pyc", line 203, in open File "mechanize_mechanize.pyc", line 255, in _mech_open httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Error at processImage(): (<class 'mechanize._response.httperror_seek_wrapper'>, <httperror_seek_wrapper (mechanize._http.RobotExclusionError instance) at 0x1817 df8 whose wrapped object = <closeable_response at 0x184d878 whose fp = <cStringI O.StringI object at 0x018C5CB0>>>, <traceback object at 0x01851AD0>) Dumping html to: Error Medium Page for image 29126463.html Cannot dump page for image_id: 29126463 Stuff happened, trying again after 2 second ( 3 ) local variable 'parseBigImage' referenced before assignment Processing Image Id: 29126463 Title: ?????????? Tags : ?????? ??????????? ?????????????? ???? ????????? ???? ??????? Mode : big Image URL : http://i2.pixiv.net/img44/img/believer_a/29126463.png Filename : C:\DL Image Packs\1471757 (believer_a)\29126463.png HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 Traceback (most recent call last): File "PixivUtil2.py", line 672, in processImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 122, in downloadImage File "mechanize_mechanize.pyc", line 203, in open File "mechanize_mechanize.pyc", line 255, in _mech_open httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Error at processImage(): (<class 'mechanize._response.httperror_seek_wrapper'>, <httperror_seek_wrapper (mechanize._http.RobotExclusionError instance) at 0x18db e68 whose wrapped object = <closeable_response at 0x1971738 whose fp = <cStringI O.StringI object at 0x0197BDB8>>>, <traceback object at 0x01974878>) Dumping html to: Error Medium Page for image 29126463.html Cannot dump page for image_id: 29126463 Stuff happened, trying again after 2 second ( 4 ) local variable 'parseBigImage' referenced before assignment Processing Image Id: 29126463 Title: ?????????? Tags : ?????? ??????????? ?????????????? ???? ????????? ???? ??????? Mode : big Image URL : http://i2.pixiv.net/img44/img/believer_a/29126463.png Filename : C:\DL Image Packs\1471757 (believer_a)\29126463.png HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 1 2 3 4 HTTP Error 403: request disallowed by robots.txt 403 Traceback (most recent call last): File "PixivUtil2.py", line 672, in processImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 122, in downloadImage File "mechanize_mechanize.pyc", line 203, in open File "mechanize_mechanize.pyc", line 255, in _mech_open httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Error at processImage(): (<class 'mechanize._response.httperror_seek_wrapper'>, <httperror_seek_wrapper (mechanize._http.RobotExclusionError instance) at 0x19a1 ed8 whose wrapped object = <closeable_response at 0x1a52ad0 whose fp = <cStringI O.StringI object at 0x01A6AE90>>>, <traceback object at 0x018E2DA0>) Dumping html to: Error Medium Page for image 29126463.html Cannot dump page for image_id: 29126463 Giving up image_id: 29126463 PixivDownloader2 version 20120806 https://nandaka.wordpress.com/tag/pixiv-downloader/

  1. Download by member_id
  2. Download by image_id
  3. Download by tags
  4. Download from list
  5. Download from online user bookmark
  6. Download from online image bookmark
  7. Download from tags list
  8. Download new illust from bookmark
  9. Download by Title/Caption

    10. Download by Tag and Member Id

d. Manage database e. Export online bookmark x. Exit Input:

and logilfe:

2012-08-09 17:17:47,533 - PixivUtil20120806 - INFO - ############################################################### 2012-08-09 17:17:47,549 - PixivUtil20120806 - INFO - Starting... 2012-08-09 17:17:47,690 - PixivUtil20120806 - INFO - Only process member where day last updated >= 7 2012-08-09 17:17:47,690 - PixivUtil20120806 - INFO - Using Username: test56 2012-08-09 17:17:47,690 - PixivUtil20120806 - INFO - logging in with saved cookie 2012-08-09 17:17:47,690 - PixivUtil20120806 - INFO - Trying to log with saved cookie 2012-08-09 17:17:51,611 - PixivUtil20120806 - INFO - Cookie already expired/invalid. 2012-08-09 17:17:51,611 - PixivUtil20120806 - INFO - Log in using form. 2012-08-09 17:17:57,503 - PixivUtil20120806 - INFO - Logged in 2012-08-09 17:18:38,331 - PixivUtil20120806 - INFO - Member id mode. 2012-08-09 17:18:43,924 - PixivUtil20120806 - INFO - Processing Member Id: 1471757 2012-08-09 17:18:51,299 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:18:55,908 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:01,753 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:06,378 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:06,378 - PixivUtil20120806 - ERROR - Error at processImage(): (<class 'mechanize._response.httperror_seek_wrapper'>, <httperror_seek_wrapper (mechanize._http.RobotExclusionError instance) at 0x1649c38 whose wrapped object = <closeable_response at 0x16e2940 whose fp = <cStringIO.StringI object at 0x016FE908>>>, <traceback object at 0x016FC378>) 2012-08-09 17:19:06,378 - PixivUtil20120806 - ERROR - Error at processImage(): 29126463 Traceback (most recent call last): File "PixivUtil2.py", line 672, in processImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 122, in downloadImage File "mechanize_mechanize.pyc", line 203, in open File "mechanize_mechanize.pyc", line 255, in _mech_open httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt 2012-08-09 17:19:06,378 - PixivUtil20120806 - ERROR - Dumping html to: Error Medium Page for image 29126463.html 2012-08-09 17:19:06,378 - PixivUtil20120806 - ERROR - Cannot dump page for image_id: 29126463 2012-08-09 17:19:12,815 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:17,440 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:22,065 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:26,690 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:26,690 - PixivUtil20120806 - ERROR - Error at processImage(): (<class 'mechanize._response.httperror_seek_wrapper'>, <httperror_seek_wrapper (mechanize._http.RobotExclusionError instance) at 0x1720d88 whose wrapped object = <closeable_response at 0x17d2a30 whose fp = <cStringIO.StringI object at 0x017E5F98>>>, <traceback object at 0x017D7AD0>) 2012-08-09 17:19:26,690 - PixivUtil20120806 - ERROR - Error at processImage(): 29126463 Traceback (most recent call last): File "PixivUtil2.py", line 672, in processImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 122, in downloadImage File "mechanize_mechanize.pyc", line 203, in open File "mechanize_mechanize.pyc", line 255, in _mech_open httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt 2012-08-09 17:19:26,690 - PixivUtil20120806 - ERROR - Dumping html to: Error Medium Page for image 29126463.html 2012-08-09 17:19:26,690 - PixivUtil20120806 - ERROR - Cannot dump page for image_id: 29126463 2012-08-09 17:19:33,283 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:37,908 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:47,096 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:51,721 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:19:51,721 - PixivUtil20120806 - ERROR - Error at processImage(): (<class 'mechanize._response.httperror_seek_wrapper'>, <httperror_seek_wrapper (mechanize._http.RobotExclusionError instance) at 0x1817df8 whose wrapped object = <closeable_response at 0x184d878 whose fp = <cStringIO.StringI object at 0x018C5CB0>>>, <traceback object at 0x01851AD0>) 2012-08-09 17:19:51,721 - PixivUtil20120806 - ERROR - Error at processImage(): 29126463 Traceback (most recent call last): File "PixivUtil2.py", line 672, in processImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 122, in downloadImage File "mechanize_mechanize.pyc", line 203, in open File "mechanize_mechanize.pyc", line 255, in _mech_open httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt 2012-08-09 17:19:51,721 - PixivUtil20120806 - ERROR - Dumping html to: Error Medium Page for image 29126463.html 2012-08-09 17:19:51,721 - PixivUtil20120806 - ERROR - Cannot dump page for image_id: 29126463 2012-08-09 17:19:58,206 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:20:02,815 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:20:07,424 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:20:12,033 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:20:12,033 - PixivUtil20120806 - ERROR - Error at processImage(): (<class 'mechanize._response.httperror_seek_wrapper'>, <httperror_seek_wrapper (mechanize._http.RobotExclusionError instance) at 0x18dbe68 whose wrapped object = <closeable_response at 0x1971738 whose fp = <cStringIO.StringI object at 0x0197BDB8>>>, <traceback object at 0x01974878>) 2012-08-09 17:20:12,033 - PixivUtil20120806 - ERROR - Error at processImage(): 29126463 Traceback (most recent call last): File "PixivUtil2.py", line 672, in processImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 122, in downloadImage File "mechanize_mechanize.pyc", line 203, in open File "mechanize_mechanize.pyc", line 255, in _mech_open httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt 2012-08-09 17:20:12,033 - PixivUtil20120806 - ERROR - Dumping html to: Error Medium Page for image 29126463.html 2012-08-09 17:20:12,033 - PixivUtil20120806 - ERROR - Cannot dump page for image_id: 29126463 2012-08-09 17:20:18,846 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:20:23,440 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:20:28,049 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:20:32,674 - PixivUtil20120806 - ERROR - HTTPError: HTTP Error 403: request disallowed by robots.txt(http://i2.pixiv.net/img44/img/believer_a/29126463.png) 2012-08-09 17:20:32,674 - PixivUtil20120806 - ERROR - Error at processImage(): (<class 'mechanize._response.httperror_seek_wrapper'>, <httperror_seek_wrapper (mechanize._http.RobotExclusionError instance) at 0x19a1ed8 whose wrapped object = <closeable_response at 0x1a52ad0 whose fp = <cStringIO.StringI object at 0x01A6AE90>>>, <traceback object at 0x018E2DA0>) 2012-08-09 17:20:32,674 - PixivUtil20120806 - ERROR - Error at processImage(): 29126463 Traceback (most recent call last): File "PixivUtil2.py", line 672, in processImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 223, in downloadImage File "PixivUtil2.py", line 122, in downloadImage File "mechanize_mechanize.pyc", line 203, in open File "mechanize_mechanize.pyc", line 255, in _mech_open httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt 2012-08-09 17:20:32,674 - PixivUtil20120806 - ERROR - Dumping html to: Error Medium Page for image 29126463.html 2012-08-09 17:20:32,674 - PixivUtil20120806 - ERROR - Cannot dump page for image_id: 29126463 2012-08-09 17:20:32,674 - PixivUtil20120806 - ERROR - Giving up image_id: 29126463

and dumped page: http://rghost.ru/39673731

Nandaka commented 12 years ago

Set userobots = False in config.ini Case Sensitive