3bl3gamer / tg_history_dumper

Exports messages and media from Telegram dialogs, groups and channels
MIT License
69 stars 13 forks source link

TL_messages_dialogsSlice Response Dump Stucks #9

Closed patrizok closed 5 months ago

patrizok commented 3 years ago

In some cases dump stucks; Actually it stucks when tgLoadChats (mtproto.TL_messages_getDialogs) return TL_messages_dialogsSlice type

Where if len(slice.Dialogs) < 100 condition comes from and what is '100' ? In my tests it slice.Dialogs return ~93 and error message looks like quote section.

I do not understand the algorihm by the way, but I think there is a problem here.

It might be helpful to review here.

Thanks

https://github.com/3bl3gamer/tg-history-dumper/blob/1db174dd526d3c7c75ea5316a2a4f3f2ecdde8d7/tg.go#L223

2021/09/16 07:02:45 [WARN] some chats seem missing: got 497 in the end, expected 493; retrying from start 2021/09/16 07:02:50 [WARN] some chats seem missing: got 994 in the end, expected 493; retrying from start 2021/09/16 07:02:54 [WARN] some chats seem missing: got 1491 in the end, expected 493; retrying from start 2021/09/16 07:03:00 [WARN] some chats seem missing: got 1988 in the end, expected 493; retrying from start 2021/09/16 07:03:03 [WARN] some chats seem missing: got 2485 in the end, expected 493; retrying from start

3bl3gamer commented 3 years ago

100 is the limit (or page size). Same value is used in TL_messages_getDialogs. Should have put it into a variable.

And the idea here is simple: 1) requesting dialogs chunks with at most 100 items before offsetDate (which refers to datetime of the last message from dialog, if I remember correctly); 2) updating offsetDate (so next chunk will end just before current one); 3) appending this chunk to all-dialogs-array chats; 4) if chats contains all dialogs (has length of slice.Count), we are done, returning; 5) if we are not done yet and len(slice.Dialogs) < limit, then something is wrong: chunk is smaller than limit (which should happen only an the end, or on last "page"), but we are not at the end yet.

There is definitely a bug: after retrying from start it should not just reset the offset (offsetDate = 0) but also clear the chats.

But the main problem is not here. That WARN can appear if dialogs order has changed while the list was loading (because of received message). But. In that case len(chats) in the end must be shorter than slice.Count. But in your log it is longer (497 > 493).

There is also a //TODO: check duplicates, but I don't remember what exactly it refers to.

If you still able to reproduce this, can you please check if there are any duplicates in the chats array? Maybe just return chats, nil after log.Warn and run dumper with -list-chats.

3bl3gamer commented 3 years ago

It's also strange that after each retry chats length increase exactly by 497. Looks like there are 4 strange chats which are not counted as dialogs but are sent in response to getDialogs. Any idea what it could be? Maybe they were deleted? Or you were kicked from them?

patrizok commented 3 years ago

I checked the problem again as you said there are some duplicates so if len(chats) == int(slice.Count) condition does not come true sometimes. For this reason before checking are they equal I add if condition to check if len(chats) > int(slice.Count) then it removes duplicate values. https://github.com/3bl3gamer/tg-history-dumper/pull/10/commits/23caa740c8a2b13732b8be7100e7b17b2a91f6b7

I know it is not best algorithm but it works. It does not stuck right now, also the profile picture feature is ready and fixed for sliced response.

You can test and merge it. Thank you for detailed explanation.

3bl3gamer commented 5 months ago

WARN can appear if dialogs order has changed while the list was loading (because of received message). But. In that case len(chats) in the end must be shorter than slice.Count. But in your log it is longer (497 > 493).

Turns out pinned chats may be received twice while fetching the chat list. The related bug was fixed in https://github.com/3bl3gamer/tg_history_dumper/commit/c5535ab2d46ae5cda02964eb1cc0022bc9b1fcce

If this was not the case, feel free to reopen the issue.