Nandaka / PixivUtil2

Download images from Pixiv and more!
http://nandaka.devnull.zone/
BSD 2-Clause "Simplified" License
2.3k stars 250 forks source link

Requirements:

Capabilities:

Docker

$ docker build -t pixivutil2 .
$ docker run -it --rm \
  -v $(pwd):/workdir \
  -w /workdir \
  pixivutil2 \
  /bin/bash -c "python PixivUtil2.py"

WARNING

Overusage can lead to Pixiv blocking your IP for a few hours.

FAQs

A. Usage

Q1. How to paste Japanese tags to the console window?
    - Click the top-left icon -> select Edit -> Paste (Cannot use Ctrl-V), if
      it show up as question mark -> Change the Language for non-Unicode
      program to Japanese (google it).
    - or use online url encoder (http://meyerweb.com/eric/tools/dencoder/)
      and paste the encoded tag back to the console.
    - or paste it to tags.txt and select download by tags list. Separate each
      tags with space, and separate with new line for new query.

Q2. My password doesn't show up in the console!
    - This is normal. The program still reads it.
    - or you can put in the config.ini if not sure.

Q3. I cannot login to Pixiv!
    - Check your password.
    - Try to login to the Pixiv Website.
    - Try to use the config.ini on the [Authentication] section.
    - Check your date and time setting (e.g.: https://www.timeanddate.com/)
    - Disable Daylight Saving Time and try again.
    - Copy your session values from browser:
      1. Open Firefox.
      2. Go to Pixiv website and login, remember to enable [Remember Me]
          check box.
      3. Press F12 to open Developer Tools, and select the Storage tab.
      4. Click the Cookies and select for the pixiv.net.
      5. Look for Cookie named = PHPSESSID.
      6. Copy the content value. https://imgur.com/a/BppHOoQ
      7. Open config.ini, go to [Authentication] section, paste the value
         to cookie. https://imgur.com/VB2g3qn

Q4. PixivUtil working from local terminal on Linux box but not working when I
    used SSH with PuTTY!
    - export LANG=en_US.UTF-8. PuTTY does not set locales right, when they are
      not set, python does not know what to write (Thanks to nho!)
    - ... and export PYTHONIOENCODING=utf-8, so it can create DB and populate
      it properly (Thanks to Mailia!)

Q5. How to delete member id from Database?
    - Open the application and choose Manage Database (d) then select delete
      Member by Member Id.
    - Open the database (db.sqlite) directly using sqlite browser and use sql
      command to delete it.
    - If you are downloading using Download from List.txt (3), you can create
      ignore_list.txt to skip the member id.

Q6. The app doesn't download all the images! (I want to download SFW images too).
    - Pixiv only allow to search up to 1000 pages if you don't have Pixiv
      Premium.
    - Check your pixiv website settings (refer to https://goo.gl/gQi09v),
      then delete the cookie value in config.ini and retry.
    - Check the value of r18mode in config.ini. Setting it to True will only
      download R-18 images.

Q7. The apps show square/question mark texts in the console output!
    - This is because your Windows is not set to Japanese for the Regional Settings
      in control panel.
    - Since 20161114+ version, you need to set the console font properties to
      use font with unicode support (e.g. Arial Unicode, MS Gothic).

Q8. Where to get FFmpeg software? How to enable `createwebm`?
    - Download the stable version of FFmpeg from https://www.ffmpeg.org/download.html.
    - For Windows:
      - Extract the archive to a folder.
      - Open the extracted folder and open to the `/bin` folder.
      - Copy the application `ffmpeg.exe` to your PixivUtil2 folder.
    - For Linux:
      - Install the package using your favorite package manager.

Q9. The downloaded images are corrupted, how to redownload it again?
    - You can delete the download history in databases by manually delete the image id
      from databases (enter d, followed by 10).
    - Or, you can set alwaysCheckFileSize = True and verifyimage = True in config.ini
      and retry the download.

Q10. I got this error またはメールアドレス、パスワードが正しいかチェックしてください。
    - Use your email address for the username, or check your password in config.ini

Q11. Older windows support (e.g. Win7)?
    - You can try to run from source code with the latest supported python 3.x.
      See the instruction here: https://github.com/Nandaka/PixivUtil2/wiki/IDE-Enviroment-(Windows)

B.Bugs/Source Code/Supports

Q1. Where I can report bugs?
    - Please report any bug to https://github.com/Nandaka/PixivUtil2/issues.

Q2. Where I can support/donate to you?
    - You can send it to my PayPal account (nchek2000[at]gmail[dot]com).
    - or visit https://bit.ly/PixivUtilDonation.

Q3. I want to use/modify the source code!
    - Feel free to use/modify the source code as long you give credit to me
      and make the modificated source code open.
    - if you want to add feature/bug fix, you can do fork the repository in
      https://github.com/Nandaka/PixivUtil2 and issue Pull Requests.

Q4. I got ValueError: invalid literal for int() with base 10: '<something>'
    - Please modify _html.py from mechanize library, search for
      'def unescape_charref(data, encoding):' and replace with patch in
      https://pastebin.com/5bT5HFkb.

Q5. I got '<library_name> module no found error'
    - Download the library from the source (see links from the Requirements
      section) and copy the file into your Lib\site-packages directory.
    - Or use pip install (google on how to use).

C.Log Messages

Q1: HTTPError: HTTP Error 404: Not Found
    - This is because the file doesn't exist in the pixiv server, usually
       because there is no big images version for the manga mode (currently the
       apps will try to download the big version first then try the normal size
       if failed, this is only for the manga mode and it is normal).

Q2: Error at process_image(): (<type 'exceptions.WindowsError'>, WindowsError
    (32, 'Prosessi ei voi kayttaa tiedostoa, koska se on toisen prosessin
    kaytossa')
    - The file is being used by another process (google translate). Either you
      ran multiple instace of Pixiv downloader from the same folder, or there
      are other processes locking the file/db.sqllite (usually from antivirus
      or some sync/backup application).

Q3: Error at process_image(): (<type 'exceptions.AttributeError'>,
    AttributeError ("'NoneType' object has no attribute 'find'",)
    - Usually this is because of failed login (cookie not valid). Try to change
      your password to simple one for testing, or copy the cookie from browser:
      1. Open Firefox/Chrome.
      2. Login to your Pixiv.
      3. On Pixiv page, press F12 and choose the Storage tab (Firefox), or
         Right click on the leftmost address bar/the (i) icon (Chrome)
      5. Click the View Cookies button.
      6. Look for Cookie named = PHPSESSID.
      7. Copy the content value.
      8. Open config.ini, go to [Authentication] section, paste the value to
         cookie.
    - Or because Pixiv has changed the layout code, so the Pixiv
      downloader cannot parse the page correctly. Please tell me by posting a
      comment if this happens and include the details, such as the member/image
      id, dump html, and log file (check on the application folder).

Q4: URLError: <urlopen error [Errno 11004] getaddrinfo failed>
    - Update version to > pixivutil20221029.
    - This is because the Pixiv downloader cannot resolve the address to
      download the images, please try to restart the network connection or do
      ipconfig /flushdns to refresh the dns cache (windows).

Q5: Error at download_image(): (<class 'socket.timeout'>, timeout('timed out',)
    - This is because the Pixiv downloader didn't receive any reply for
      specified time in config.ini from Pixiv. Please retry the download again
      later.

Q6: httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt
    - Set userobots = False in config.ini

Command Line Option

Please refer run with --help for latest information.

  -h, --help            show this help message and exit
  -s STARTACTION, --startaction=STARTACTION
                        Action you want to load your program with:
                        1 - Download by member_id
                            (required: list of member_ids separated by space
                             optional: --include_sketch to also download Pixiv Sketch)
                        2 - Download by image_id
                            (required: followed by image_ids separated by space)
                        3 - Download by tags
                            (required: tags
                             optional: --use_wildcard_tag, --sp=START_PAGE, and --ep=END_PAGE, --start_date, --end_date)
                        4 - Download from list
                            (required: -f LIST_FILE and followed with optional tag)
                        5 - Download from user bookmark
                            (optional: -p BOOKMARK_FLAG [y/n/o] for private bookmark, --sp=START_PAGE, and --ep=END_PAGE)
                        6 - Download from image bookmark
                            (required: -p BOOKMARK_FLAG [y/n/o] for private bookmark
                             optional: --sp=START_PAGE, and --ep=END_PAGE, and followed with tag)
                        7 - Download from tags list
                            (required: -f LIST_FILE,
                             optional: --sp=START_PAGE, and --ep=END_PAGE, --start_date, --end_date)
                        8 - Download new illust from bookmark
                            (optional: --sp=START_PAGE, and --ep=END_PAGE)
                        9 - Download by Title/Caption
                            (required: title/caption
                             optional: --sp=START_PAGE, and --ep=END_PAGE, --start_date, --end_date)
                        10 - Download by Tag and Member Id
                            (required: member_id, followed by tags
                             optional: --sp=START_PAGE, and --ep=END_PAGE)
                        11 - Download Member's Bookmarked Images
                            (required: followed by member_ids separated by space)
                        12 - Download by Group ID
                            (required: Group ID, limit, and process external[y/n])
                        13 - Download by Manga Series ID
                            (required: Manga Series ID separated by space
                            optional: --sp=START_PAGE, and --ep=END_PAGE)
                        f1 - Download from supported artists (FANBOX)
                            (optional: End Page)
                        f2 - Download by artist/creator id (FANBOX)
                            (required: artist(digits only)/creator ids separated by space,
                             optional: end page)
                        f3 - Download by post id (FANBOX)
                            (required: post ids, separated with space)
                        f4 - Download from followed artists (FANBOX)
                            (optional: End Page)
                        f5 - Download from custom artist list (FANBOX)
                            (optional: End page, path to list)
                        b - Batch Download from batch_job.json (experimental)
                            (optional: --bf=BATCH_FILE)
                        l - Export local database image_id/post_id
                            (required: --up=USE_PIXIV, and --uf=USE_FANBOX, and --us=USE_SKETCH)
                        e - Export online bookmark
                            (required: -p BOOKMARK_FLAG [y/n/o] for private bookmark,
                             optional: --ef=EXPORT_FILENAME)
                        m - Export online user bookmark
                            (required: member_id, optional: --ef=EXPORT_FILENAME)
                        d - Manage database
  -x, --exitwhendone    Exit programm when done.
                        (only useful when DB-Manager)
  -i, --irfanview       start IrfanView after downloading images using
                        downloaded_on_%date%.txt
  -n NUMBEROFPAGES, --numberofpages=NUMBEROFPAGES
                        temporarily overwrites numberOfPage set in config.ini
  -c [PATH], --config [PATH] provide different config.ini

Error Codes

config.ini

[Authentication]

[Pixiv]

[FANBOX]

[Network]

[Debug]

[IrfanView]

[Settings]

[DownloadControl]

[FFmpeg]

[Ugoira]

[Filename]

Filename Format Syntax

Available for filenameFormat, filenameMangaFormat, avatarNameFormat, filenameInfoFormat, filenameFormatFanboxCover, filenameFormatFanboxContent and filenameFormatFanboxInfo:

-> %member_token%
   Member token, might change.
-> %member_id%
   Member id, in number.
-> %artist%
   Artist name, might change too.
-> %urlFilename%
   The actual filename stored in server without the file extensions.
-> %date%
   Current date in YYYYMMMDD format.
-> %date_fmt{format}%
   Current date using custom format.
   Use Python string format notation, refer: https://goo.gl/3UiMAb
   e.g. %date_fmt{%Y-%m-%d}%
-> %image_ext%
   The image's file extension (jpg, png, etc.), the "." is not included.
   The correct file extension is already appended to the end of all files.
   This is available if you want to add more, or want to add the image's file extension to info files etc.

Available for filenameFormat and filenameMangaFormat:

-> %image_id%
   Image id, in number. (Post id for FANBOX and sketches)
-> %title%
   Image title, usually in japanese character.
-> %tags%
   Image tags, usually in japanese character. (not implemented for FANBOX yet)
-> %works_date%
   Works date, complete with time.
-> %works_date_only%
   Only the works date.
-> %works_date_fmt{<format>}%
   works date using custom format.
   Use Python string format notation, refer: https://goo.gl/3UiMAb
   e.g. %works_date_fmt{%Y-%m-%d}%
-> %works_res%
   Image resolution, will be containing the page count if manga.
-> %works_tools%
   Tools used for the image.
-> %R-18%
   Append R-18/R-18 based on image tag, can be used for creating directory
   by appending directory separator, e.g.: %R-18%\%image_id%.
-> %page_big%
   for manga mode, add big in the filename.
-> %page_index%
   for manga mode, add page number with 0-index. It will auto-pad with 0 based on the total count.
-> %page_number%
   for manga mode, add page number with 1-index. It will auto-pad with 0 based on the total count.
-> %bookmark%
   for bookmark mode, add 'Bookmarks' string.
-> %original_member_id%
   for bookmark mode, put original member id.
-> %original_member_token%
   for bookmark mode, put original member token.
-> %original_artist%
   for bookmark mode, put original artist name.
-> %searchTags%
   for download by tags and bookmarked images, put searched tags.
-> %bookmark_count%
   Bookmark count, will have overhead except on download by tags.
-> %image_response_count%
   Image respose count, will have overhead except on download by tags.
-> %manga_series_order%
   the order in the manga series.
-> %manga_series_id%
   original manga series id.
-> %manga_series_title%
   original manga series title, different from work title.
-> %AI%
   Add 'AI' for AI-generated images (aiType==2).

Specific for PixivSketch (option 1 if PixivSketch included, s1, and s2 ):

-> %sketch_member_id%
   Pixiv Sketch artist id, might be different from Pixiv's artist id.

Specific for Fanbox:

-> %fanbox_name%
   Fanbox name, might be different from Pixiv's artist name.
   Useful if the artist is suspended from Pixiv and there is no record in the DB to avoid interuption.

list.txt Format

tags.txt Format

suppress_tags.txt Format

blacklist_tags.txt Format

blacklist_members.txt Format

HTML Format

Bad chars

Development

PixivUtil2 posesses robust test suite. To run it, one needs pytest suite:

pip install --user pytest

pytest -v ./test_*

Credits/Contributor

** If I forget someone, please send me a pull request with the commit/merge id.

License Agreement

See LICENSE.

Run on Repl.it