[BUG] Unsupported image file Type

Instinkt-Servers commented 1 week ago

✔️ Expected Behaviour

I expect that the Script saves the Images in a supported format

🐞 Actual Behaviour

[INFO] -> found 5 ad config files [INFO] Loading ad from [C:\Scripts\KleinanzeigenBot\downloaded-ads\ad_2821\ad_282.yaml]... [ERROR] Unsupported image file type [ad_2821**img1.auto] [PYI-15788:ERROR] Failed to execute script 'main__' due to unhandled exception! successfully removed temp profile C:\Users\KLEINA~1\AppData\Local\Temp\4\uc_fbr*****

📋 Steps to Reproduce

kleinanzeigen-bot.exe download 2.1 kleinanzeigen-bot.exe verify 2.2 kleinanzeigen-bot.exe publish

📺 What browsers are you seeing the problem on? (if applicable)

Microsoft Edge

💻 What operating systems are you seeing the problem on? (if applicable)

Windows

📃 Relevant log output (if applicable)

In yaml:

images:

ad_282*****__img1.auto
ad_282*****__img2.auto

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Instinkt-Servers commented 1 week ago

That´s happening because of this Code here: [76-85]

            if current_img_url is None:
                continue
            file_ending = current_img_url.split('.')[-1].lower()
            img_path = directory + '/' + img_fn_prefix + str(img_nr) + '.' + file_ending
            if current_img_url.startswith('https'):  # verify https (for Bandit linter)
                urllib_request.urlretrieve(current_img_url, img_path)  # nosec B310
            dl_counter += 1
            img_paths.append(img_path.split('/')[-1])

in extract.py.

File URL: https://img.kleinanzeigen.de/api/v1/prod-ads/images/1e/XXXXX-XXXXXX-XXXXX-XXXXX?rule=$_59.AUTO

so the Image is interpreted as .AUTO

In the Sourcecode you could read <meta property="og:image" content="https://img.kleinanzeigen.de/api/v1/prod-ads/images/1e/XXXX-XXXXX-XXXX-XXXX?rule=$_59.JPG"/> and you will receive .jpg.

Instinkt-Servers commented 1 week ago

Possible (ugly) fix:

while img_nr <= n_images:  # scrolling + downloading
    current_img_url = img_element.attrs['src']  # URL of the image
    if current_img_url is None:
        continue
    file_ending = current_img_url.split('.')[-1].lower()
    # Ändern der Dateiendung, wenn sie "auto" ist
    if file_ending == "auto":
        file_ending = "jpg"
    img_path = directory + '/' + img_fn_prefix + str(img_nr) + '.' + file_ending
    if current_img_url.startswith('https'):  # verify https (for Bandit linter)
        urllib_request.urlretrieve(current_img_url, img_path)  # nosec B310
    dl_counter += 1
    img_paths.append(img_path.split('/')[-1])

Do not forget to Update the Content of yaml

k-jell commented 1 week ago

quick bash workaround until this is fixed. Run those commands in the downloaded_ads directory

find . -type f -name "*.auto" -exec bash -c 'mv "$0" "${0%.auto}.jpg"' {} \;
find . -type f -name "*.yaml" -exec sed -i 's/\.auto/\.jpg/g' {} +

Second-Hand-Friends / kleinanzeigen-bot