Second-Hand-Friends / kleinanzeigen-bot

A dilligent command line tool to publish ads on kleinanzeigen.de
GNU Affero General Public License v3.0
222 stars 47 forks source link

[BUG] Unsupported image file Type #348

Closed Instinkt-Servers closed 1 week ago

Instinkt-Servers commented 1 week ago

βœ”οΈ Expected Behaviour

I expect that the Script saves the Images in a supported format

🐞 Actual Behaviour

[INFO] -> found 5 ad config files [INFO] Loading ad from [C:\Scripts\KleinanzeigenBot\downloaded-ads\ad_2821\ad_282.yaml]... [ERROR] Unsupported image file type [ad_2821**img1.auto] [PYI-15788:ERROR] Failed to execute script 'main__' due to unhandled exception! successfully removed temp profile C:\Users\KLEINA~1\AppData\Local\Temp\4\uc_fbr*****

πŸ“‹ Steps to Reproduce

  1. kleinanzeigen-bot.exe download 2.1 kleinanzeigen-bot.exe verify 2.2 kleinanzeigen-bot.exe publish

πŸ“Ί What browsers are you seeing the problem on? (if applicable)

Microsoft Edge

πŸ’» What operating systems are you seeing the problem on? (if applicable)

Windows

πŸ“ƒ Relevant log output (if applicable)

In yaml:

images:

Code of Conduct

Instinkt-Servers commented 1 week ago

ThatΒ΄s happening because of this Code here: [76-85]

            if current_img_url is None:
                continue
            file_ending = current_img_url.split('.')[-1].lower()
            img_path = directory + '/' + img_fn_prefix + str(img_nr) + '.' + file_ending
            if current_img_url.startswith('https'):  # verify https (for Bandit linter)
                urllib_request.urlretrieve(current_img_url, img_path)  # nosec B310
            dl_counter += 1
            img_paths.append(img_path.split('/')[-1])

in extract.py.

File URL: https://img.kleinanzeigen.de/api/v1/prod-ads/images/1e/XXXXX-XXXXXX-XXXXX-XXXXX?rule=$_59.AUTO

so the Image is interpreted as .AUTO

In the Sourcecode you could read <meta property="og:image" content="https://img.kleinanzeigen.de/api/v1/prod-ads/images/1e/XXXX-XXXXX-XXXX-XXXX?rule=$_59.JPG"/> and you will receive .jpg.

Instinkt-Servers commented 1 week ago

Possible (ugly) fix:

while img_nr <= n_images:  # scrolling + downloading
    current_img_url = img_element.attrs['src']  # URL of the image
    if current_img_url is None:
        continue
    file_ending = current_img_url.split('.')[-1].lower()
    # Γ„ndern der Dateiendung, wenn sie "auto" ist
    if file_ending == "auto":
        file_ending = "jpg"
    img_path = directory + '/' + img_fn_prefix + str(img_nr) + '.' + file_ending
    if current_img_url.startswith('https'):  # verify https (for Bandit linter)
        urllib_request.urlretrieve(current_img_url, img_path)  # nosec B310
    dl_counter += 1
    img_paths.append(img_path.split('/')[-1])

Do not forget to Update the Content of yaml

k-jell commented 1 week ago

quick bash workaround until this is fixed. Run those commands in the downloaded_ads directory

find . -type f -name "*.auto" -exec bash -c 'mv "$0" "${0%.auto}.jpg"' {} \;
find . -type f -name "*.yaml" -exec sed -i 's/\.auto/\.jpg/g' {} +