Dineshkarthik / telegram_media_downloader

Download media files from a telegram conversation/chat/channel up to 2GiB per file
MIT License
2.08k stars 359 forks source link

[420 FLOOD_WAIT_X] would cause photos and videos which size is 0KB #495

Open porridgexj opened 2 months ago

porridgexj commented 2 months ago

When files in the channel are too many, and will cause the ERROR below:

ERROR Telegram says: [420 FLOOD_WAIT_X] - A wait of 2919 seconds is required (caused by client.py:1021 "auth.ExportAuthorization")

and that generate many videos and photos which is 0KB, and the program will move on. It will cause many photos and videos are skipped 屏幕截图 2024-06-21 205416 , and the program will never redownload them again.

porridgexj commented 2 months ago

I think it is a bug from pyrogram, we just can't catch the FLOOD error from app.download_media function. So I find another way to solve this, when show a FloodWait error, the file downloaded size usually is 0KB, so we can test whether the file size is 0KB.

def is_file_empty(file_path):
    if os.path.exists(file_path):
        return os.path.getsize(file_path) == 0
    else:
        return True

flag = True
while flag:
    # 'download_media' is a function from 'pyrogram'
    file_path = app.download_media(file_id)
    if file_path and not is_file_empty(file_path):  
        flag = False
        # solve success situation
    else:
        time.sleep(10)
        # sleep for several seconds and retry
AIRDOGE commented 1 month ago

the same problem

AIRDOGE commented 1 month ago

I think it is a bug from pyrogram, we just can't catch the FLOOD error from app.download_media function. So I find another way to solve this, when show a FloodWait error, the file downloaded size usually is 0KB, so we can test whether the file size is 0KB.

def is_file_empty(file_path):
    if os.path.exists(file_path):
        return os.path.getsize(file_path) == 0
    else:
        return True

flag = True
while flag:
    # 'download_media' is a function from 'pyrogram'
    file_path = app.download_media(file_id)
    if file_path and not is_file_empty(file_path):  
        flag = False
        # solve success situation
    else:
        time.sleep(10)
        # sleep for several seconds and retry

Where should I need to insert this code?

AIRDOGE commented 1 month ago

Is there any way to make the program automatically wait for the time required by telegram and than restarting the download?

porridgexj commented 1 month ago

Is there any way to make the program automatically wait for the time required by telegram and than restarting the download?

We can’t get the seconds that telegram required, because we just can’t catch the FLOOD error info from pyrogram. If you want to make the program wait for the time required by telegram exactly, I think you can let time.sleep from 10 to 1…but i think it will put pressure on telegram’s server, i think check every 30 seconds is ok

by the way, I didn’t try to insert my code to TELEGRAM_MEIDA_DOWNLOADER, because I just want to download all pics from a channel, it is easy to achieve By just using PYROGRAM, so I wrote my own code, and not use TELEGRAM_MEDIA_DOWNLOADER

gauravsuman007 commented 1 week ago

I just used the code from @porridgexj and put it in the media_downloader.py. I first defined a new function:

def is_file_undownloaded_or_empty(file_path):
    if os.path.exists(file_path):
        logger.info("%s already exists, checking size", file_path)  
        return os.path.getsize(file_path) == 0
    else:
        return True

Then I modified the download code (note that the commented lines are the old code. Also moved the Media downloaded logger info to inside the loop):

                file_name, file_format = await _get_media_meta(_media, _type)
                if _can_download(_type, file_formats, file_format):
                    #if _is_exist(file_name):
                    #    file_name = get_next_name(file_name)
                    #    download_path = await client.download_media(
                    #        message, file_name=file_name
                    #    )
                    #    # pylint: disable = C0301
                    #    download_path = manage_duplicate_file(download_path)  # type: ignore
                    #else:
                    #    download_path = await client.download_media(
                    #        message, file_name=file_name
                    #    )
                    if is_file_undownloaded_or_empty(file_name):
                        download_path = await client.download_media(
                            message, file_name=file_name
                        )
                        if download_path:
                            logger.info("Media downloaded - %s", download_path)
                    else:
                        logger.info("Skipping download of %s", file_name)

                    DOWNLOADED_IDS.append(message.id)

Now I'm able to resume downloads. With these changes I have disabled the functionality to redownload existing files by appending a number to (see the commented part in the code) it but I think that doesn't really matter as only completely downloaded files get renamed from FILENAME.temp to FILENAME and so we don't need any redownloads.

The final thing I always have to do before relaunching the script is to reset last_read_message_id to 0 as otherwise, downloads don't start for me (weird bug)