LonamiWebs / Telethon

Pure Python 3 MTProto API Telegram client library, for bots too!
https://docs.telethon.dev
MIT License
9.93k stars 1.4k forks source link

iter_messages doesn't work with some channels when limit=None #3949

Closed hughbe closed 2 years ago

hughbe commented 2 years ago

Hi,

Firstly - thanks for producing such a robust library. Finding it very useful.

I've encountered some unusual behaviour with client.iter_messages.

In the example below:

I can't quite explain why one channel works with limit=None and keeps iterating downwards, and the other doesn't!

Checklist

Code that causes the issue

import asyncio
import config
from telethon import TelegramClient

async def main():
    #CHANNEL = 'SolovievLive'
    CHANNEL = 'TelethonChat'

    client = TelegramClient("name", config.TELEGRAM_API_ID, config.TELEGRAM_API_HASH)
    await client.connect()

    async for message in client.iter_messages(
        CHANNEL,
        limit=None
    ):
        print(message.id, message.date)

if __name__ == "__main__":
    asyncio.run(main())
Lonami commented 2 years ago

Thanks for the kind words.

Luckily, I can reproduce the issue. Telethon sees that it made a request for 100 messages but only 99 were returned. It then assumes "well, if I asked for 100 and there's only 99 left, I must have reached the end":

https://github.com/LonamiWebs/Telethon/blob/db29e9b7efb204fbe56fe8d59ccc4d55ceddc165/telethon/client/messages.py#L207-L208

I have no idea why it's doing this. It's very unfortunate, because a fix will hinder all other "healthy" channels into making one more request than necessary to confirm if there are messages left.

hughbe commented 2 years ago

So for what its worth, I was able to do something like the following:

max_id = 0
while True:
    messages = await client.get_messages(CHANNEL, max_id=max_id)
    if not messages:
        break

    for message in messages:
        print(message.id, message.date)

    max_id = messages[-1].id

Would this have the desired result, you think?

Lonami commented 2 years ago

That's a workaround. But it's stupid the client has to do (something equivalent to) that. It might be a bug in Telegram itself.

hughbe commented 2 years ago

Should I report it? If so, to whom?

Lonami commented 2 years ago

I've talked about it in https://t.me/tdlibchat/57257 if you're interested. The above commit should fix the issue though, so I don't really care much more beyond that.

Lonami commented 2 years ago

Here's another fun one someone else found:

offset_id 132002, limit 4 => we get msg 131999 & 131998
offset_id 132002, limit 3 => we get msg 131999
offset_id 132002, limit 2 => we get msg 132000
offset_id 132002, limit 1 => we get msg 132001
datadius commented 1 year ago

I got a similar issue which got me here.

  1. I made a group where I tested a userbot if I could count the top 10 users with most messages.
  2. I added my bot
  3. I wrote messages and added a few other users to test
  4. I ran the command

I suddenly noticed that when reversed was applied, I could only get the first 100 messages, but if I removed reverse, then it would get all messages.

 async for message in bot.iter_messages(event.chat_id, reverse=True,limit=None, offset_date=one_week_back):
      print(message)

my solution was to add min_id so I skipped the first 2-3 MessageService where it says the group is created

async for message in bot.iter_messages(event.chat_id, reverse=True,limit=None, offset_date=one_week_back, min_id = 5):
     print(message)

Posting it here in case it's useful. I got the idea to use min_id when I saw the behavior with limit.