Nandaka / PixivUtil2

Download images from Pixiv and more!
http://nandaka.devnull.zone/
BSD 2-Clause "Simplified" License
2.37k stars 257 forks source link

Database always downloading missing, but already downloaded files #613

Closed ItaloKnox closed 4 years ago

ItaloKnox commented 4 years ago

Prerequisites

Description

I've been using PixivUtil2 to keep a small cloud database of a few select artists with the command: pixiv -n 1 -x --startaction=1 29362997 5594793 1836747 6018940 4446354 13992671 4493551 2053497 9666061 14212838 1590145 7738363 12378747 14438469 36534524 8129277 17178734

I run this command at least once every day to check if there are new images. If a new image is found, it gets downloaded, then sent to the cloud using rclone copy, then finally the local image is deleted to save disk space. The usual behavior of PixivUtil2 until now was to ignore all files that were already downloaded by it before (even if they weren't present in the folder), but now all files from all artists are downloaded again as if the database was made from scratch.

That means that to run this command and only get the new images from the artists, I need to keep a local copy of every image from their first page. If any of them is deleted, it gets downloaded again, despite PixivUtil2 ignoring it before for as long as I remember.

Steps to Reproduce

  1. run pixivutil2 -n 1 -x --startaction=1 <artist id>
  2. let it download a few images
  3. stop PixivUtil2 and delete the downloaded images
  4. run the same command again

Expected behavior: usually, already downloaded images would be ignored even if they were removed from the system. It would display an "already downloaded" message and skip to the next one.

Actual behavior: any image that is not available locally will be downloaded again over and over again, as if the database didn't register it as downloaded before.

Versions

v20200101. I can't confirm if it happens up until v21091218.

ghost commented 4 years ago

Same issue here with 20200101 When I do "8. Download new illust from bookmarked members (/bookmark_new_illust.php)", I choose end page=50, then I go to sleep, after this finished, I move these folders to my collection folder, it asked me to if I want to replace 1000+ files that already downloaded a long time ago. It seems the ver 20200101 just add history download history in DB, but not check file download history in DB, and cause duplicate downloads. I have tested ver 20191221, no DB issue.

Nandaka commented 4 years ago

Changes on #609, updated the logic check in https://github.com/Nandaka/PixivUtil2/releases/tag/v20200102-beta1

ItaloKnox commented 4 years ago

I'm not sure if the patch is working as intended. I pulled my entire collection from the cloud server and ran the same command as always to register them as "already downloaded". Then I deleted almost all images except the last few of each artist, then finally ran the same command again in a new folder without any images and the program did the same behavior again, with a difference: it was skipping the last few images I left in the previous folder, but still downloaded all the others that I deleted moments ago.

Here is a tree of the files:

.
├── 29362997
│   ├── 78236553_p0.jpg
│   ├── 78279338_p0.jpg
│   ├── 78315948_p0.jpg
│   └── 78361195_p0.jpg
└── test                                 <<< PixivUtil2 started here
    └── 29362997
        ├── 77995868_p0.jpg
        ├── 78026019_p0.jpg
        ├── 78057073_p0.jpg
        └── 78156644_p0.jpg

And the last 8 images from the artist, starting from the latest:

  1. 78361195
  2. 78315948
  3. 78279338
  4. 78236553
  5. 78156644
  6. 78057073
  7. 78026019
  8. 77995868

Lastly, I attached a clean log of the entire operation: pixivutil.log

Nandaka commented 4 years ago

try https://github.com/Nandaka/PixivUtil2/releases/tag/v20200103-beta2

I'm running the mode 1 and then moved the images to different folder and I see it work as expected (no download delay). Ensure alwayscheckfilesize and overwrite is set to False in config.ini, else it will have some delay enabled as it actually retrieve the image information from pixiv server.

image

Nanoka commented 4 years ago

I can confirm it does not work with either patch (tested with option 1 and 8), but it does work fine running from source using the same config.ini

ItaloKnox commented 4 years ago

Same here, the newest patch still behaves like I described in my previous post. I checked the config.ini and both options are set to False as intended.

I haven't tried compiling from source yet, I will give it a try later if no more patches are deployed in the meantime. I just gave a try to the source and it works just fine like @Nanoka said. No changes to the configuration and same database.

pj83 commented 4 years ago

Had the same issue even with v20200103-beta2. I ended up downloading a fresh copy of beta2 to a new folder and copied across the config.ini and db file across to the new folder and it works as expected (previously i had always overwritten the previous version).