Richard-Weiss / Bing-Creator-Image-Downloader

Downloads all Bing Creator images from a collection
MIT License
38 stars 8 forks source link

Use alternate image as fallback for creation date #12

Closed rc-gr closed 10 months ago

rc-gr commented 10 months ago

This happens when the saved image is valid, but not present among the generated images in the prompt URL.

This behavior is also present when accessing the saved image manually in the collection via the browser. ie. The thumbnail shows the correct image, but is not among the images in the prompt URL.

From initial testing, the creation date and prompt of the alternate image, which is the first of the generated images in the prompt URL, seems to line up with the saved one.

In other words, this handles cases Image X is accessed with the correct and valid thumbnail and image URL, but prompt URL contains images A,(B),(C),(D) with matching prompt and date.

Richard-Weiss commented 10 months ago

@rc-gr I'll take a look into that on the weekend, but I'm still not completely understanding the issue.
Can you elaborate more or show some specific examples?

rc-gr commented 10 months ago

@Richard-Weiss Referring to the observation that you mentioned here:

Because of how the images are accessed there may be some images that are duplicated or images that appear that weren't in the text file.

Suppose you save Image X which is initially present on Prompt Page X. In the collections where it's saved, it reflects as such with corresponding Thumbnail X.

Some time later, when you try to access Image X from your collections by following the link to Prompt Page X, Image Y is shown instead on Prompt Page X, along with 1-3 other incorrect images from the prompt. But the prompt and generation date shown on the page is still correct, and it's still showing as Thumbnail X (based on Image X) in the collections.

What you probably observed back then was Image Y being downloaded instead of Image X, because Image X somehow no longer exists on Prompt Page X despite its direct image link still being valid.

Currently, without the commit, the program would error out when the above occurs, because Image X's decoded_image_id would not be found among the imageIds of images. This causes resp_image on line 271 to be an empty array.

With this commit, the program will sidestep the issue by taking the creation date from Image Y, which is valid as the first image of the set of images on Prompt Page X, which is similar to Image X's date.

Note that this will not affect the downloaded image or thumbnail, which are valid and correct regardless.

Richard-Weiss commented 10 months ago

@rc-gr Is this something that can actually happen? I mean it sounds like a nice workaround, but I don't see how the imageId or imageSetId could change for a given image in the collection. The mediaUrl might, but not the pageUrl or any of these fixed values.
Do you maybe have an image that exhibits this behavior so I can reproduce?

rc-gr commented 10 months ago

@Richard-Weiss Yes, it did happen to me, albeit rarely...but I'm afraid you'll have to bear with me as I try to explain further while being unable to share the affected image.

For reference, this was the error I saw (before the commit): `INFO Fetching metadata of collections... Traceback (most recent call last): File "...\Bing-Creator-Image-Downloader\main.py", line 382, in asyncio.run(main()) File "...\Lib\asyncio\runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "...\Lib\asyncio\runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "...\Lib\asyncio\base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "...\Bing-Creator-Image-Downloader\main.py", line 370, in main await set_creation_dates(image_data) File "...\Bing-Creator-Image-Downloader\main.py", line 254, in set_creation_dates await asyncio.gather(*tasks) File "...\Bing-Creator-Image-Downloader\main.py", line 273, in _set_creation_date resp_image = [img for img in images if img['imageId'] == decoded_image_id][0]


IndexError: list index out of range`

Suppose I were to print out some of the values at the line before the error occurs, specifically from `decoded_image_id` and each element of `images`:
`decoded_image_id: VZUkHqdk...`
`img['imageId']: LOiT734Z...`
`img['imageId']: o+dr6/rq...`
`img['imageId']: e9zGg/4w...`

In this example, `VZUkHqdk...` is the Id of the image that I want to keep (ie. Image X). If I were to get more values from the passed in `image` for this example, both `image_link` and `thumbnail_link` also point to Image X and is still accessible. But if I were to access its `image_page_url`, I would not be able to find the image there, and instead see 3 other images, which would correspond to `LOiT734Z...`, `o+dr6/rq...` and `e9zGg/4w...`, that I did not want nor save. However, the creation date and prompt over there matches that of Image X.
Richard-Weiss commented 10 months ago

@rc-gr I get what you mean now. It doesn't really matter which one of these images we take as the creation date is the same anyways.
I just find it weird that the request works but the image won't be returned in the response.
Anyways, looks good to me. 🙂