Confuzu / CivitAI_Image_grabber

Downloads every Image from a given CivitAi Username / Model ID / Model TAG / Model versionID
GNU General Public License v3.0
35 stars 3 forks source link

API is feeding images that this script is ignoring #24

Closed xion2 closed 1 month ago

xion2 commented 1 month ago

I don't know the exact cause but your script is ignoring images that the API is feeding.

I chose a random profile that was very small to illustrate this issue. If you type this into a browser you will see a list of images in text format: (VERY NSFW profile BTW) https://civitai.com/api/v1/images?period=AllTime&sort=Newest&username=maxihw&nsfw=true

The images that your grabber skips are: 21516145 21533883 21734311 21734319

However, all these images are clearly visible in the text list of that URL. Ctrl+f and you will find all of them. If you put that URL into a downloader like Bulk Image Downloader it'll also detect and download the smaller images. So clearly something is wrong with the way the script is set up but I lack the knowledge to fix it.

Confuzu commented 1 month ago

I was able to replicate it. I'll have a look and try to find the problem and fix it

Confuzu commented 1 month ago

please update and read 1.2 New Features & Update for more infos hope the update resolves the problem

xion2 commented 1 month ago

Didn't resolve the problem. Only downloaded 1 (21516145) of the 4 missing images from that profile. Does it work for you and download all 32 images?

Confuzu commented 1 month ago

Ok for some reason, the API with the URL generated by my script https://civitai.com/api/v1/images?username=maxihw&nsfw=true

only shows 29 images, but when I include ?sort=Newest in the API call, all 32 images are displayed. Strange

i updated the code please try it

xion2 commented 1 month ago

That one seems to work. Also downloads all the images in order instead of randomized. Trying it with a bigger profile now just to make sure it's not running into any issues. I'll update this post with the results after it's done.

Test 1: 1,493 of 1,523 images downloaded. The prior version downloaded 1,208 images. Big improvement, not sure if Civitai is reporting the number of available images incorrectly, doing more testing.

Test 2: 7,229 of 7,373 images downloaded.

Update 1: So I manually downloaded the entire profile for the first test and it was indeed missing the images. Trying to find a smaller profile to recreate the issue and make it easier to troubleshoot. Also going to try to increase the timeout time.

Test 3: 1,445 of 1,473 images downloaded.

xion2 commented 1 month ago

That one seems to work. Also downloads all the images in order instead of randomized. Trying it with a bigger profile now just to make sure it's not running into any issues. I'll update this post with the results after it's done.

Test 1: 1,493 of 1,523 images downloaded. The prior version downloaded 1,208 images. Big improvement, not sure if Civitai is reporting the number of available images incorrectly, doing more testing.

Test 2: 7,229 of 7,373 images downloaded.

Update 1: So I manually downloaded the entire profile for the first test and it was indeed missing the images. Trying to find a smaller profile to recreate the issue and make it easier to troubleshoot. Also going to try to increase the timeout time.

Test 3: 1,445 of 1,473 images downloaded.

Ok, I figured it out after testing a bit more. Something in the batch processing part of the script is causing it to skip images every time it pauses. That's why the number of missing images is nearly identical between Test 1 and Test 2.

I then tried to download from a smaller profile with 158 images, 3 were missing, as expected.

I managed to find a profile with 99 images where it skips 1 file consistently, the same file. Here is a link to the profile so you can test it: https://civitai.com/user/Robi822

And here is a link to the image it skips every time as the script "pauses" https://civitai.com/images/26692357

Hopefully that is helpful in narrowing down the issue.

xion2 commented 1 month ago

Lowered the number for semaphore to 3 and got this error in my log. First time getting it.

2024-09-21 07:45:17,739 - ERROR - Error processing URL https://civitai.com/api/v1/images?nsfw=X&sort=Newest&username=Robi822&cursor=26702853: [Errno 13] Permission denied:

BTW, this is the missing image as of right now because the artist uploaded a new image. Did not receive this error in my log prior. I have a feeling it's something breaking with asyncio and semaphore.

Confuzu commented 1 month ago

The Main problem is that the API and the Website has sometimes different numbers. Some times the API has more images then the Website is showing at the images button This user GiraffeJR44 has 217 images according to the website but the API says 344 Final count - Total API items: 344, Total downloaded: 344

and what I also found out is that the API is probably the problem and not my script good example is this account bruhmax at limit 100 there are 100 images on the first page and 34 on the 2nd page which makes 134 and so one picture is missing. If I set the limit to 200 there are the full 135 images that are also displayed on the website as the number of images.

With large accounts it is not possible to prevent the images from being spread over several pages. good example is this account Jesse_F with 4461 images. On the last page the API shows only 39 images but there should be 61 with a limit of 200. there are exactly 22 images missing, matching the number of pages with 4461 images with a limit of 200. So it looks like it is not a bug or problem with my script but a civitAI API problem.