LonamiWebs / Telethon

Pure Python 3 MTProto API Telegram client library, for bots too!
https://docs.telethon.dev
MIT License
9.88k stars 1.4k forks source link

2.0 wishlist #1169

Closed Lonami closed 2 years ago

Lonami commented 5 years ago

Telethon currently has some weird things in it that need changing, but would be breaking changes. Therefore, a new major release should be made. We should aim for making a single release with the biggest amount of breaking changes, instead of making breaking changes across many releases.

Please post in this issue any gripe you have with the library and that you would like it to change.

Of course, a last release 1.X will be made with deprecation warning on all these methods, so people know how to upgrade.

(2021-09 update: probably no "last 1.X" with deprecation; instead a document will be prepared, along with some helper code to ease the migration.)

Lonami commented 5 years ago
DaveScream commented 5 years ago

That would be cool to have function that will upload file_list in multithread and return media_ids

DaveScream commented 5 years ago

Sometimes send_file takes attributes=single item, sometimes it warns and say that it should be LIST

Lonami commented 5 years ago

@DaveScream you can achieve that with the facilities asyncio offers (create_task, wait or gather). Telethon should only offer a way if it can be optimized anyhow (for example, forwarding more than one message at once), and not add unnecessary clutter.

cher-nov commented 5 years ago

Telethon networking core needs a throughout refactoring to make abstractions more strict (remember our clumsy implementation of MTProxy'es). I've tried to implement it by myself in March-April, but failed (the whole library simple refused to work correctly after this) since I'm lack of deep knowlenge of current Telethon code. So I'll just write a concept here.

Currently we have an architecture very bound to the existing Telegram protocols and servers. We also don't properly distinguish between socket proxies and MTProxy in the TelegramClient constructor while we should. We also have a eerie pieces of Authenticator and MTProtoPlainSender which are used only once and only by MTProtoSender. And we also force user to check the dd prefix in a secret (about which it should not be aware) to choose between ConnectionTcpMTProxyRandomizedIntermediate and ConnectionTcpMTProxyIntermediate or even ConnectionTcpMTProxyAbridged.

I propose the next structure:

  protocol/
    mtproto10.py
    mtproto20.py
  transport/
    tcpobfuscated.py
    ...
  connection/
    asyncsocket.py

This will bring to us the next additional abilities: 1) Use MTProxy over socket proxy to make life of DPI systems even harder. 1) Connect to custom servers with custom protocols.

It would be also nice to add support of test servers like Pyrogram does. And I also pretty sure that we should keep the sync solution since it simplifies things a lot in easy use-cases.

Lonami commented 5 years ago

It would be also nice to add support of test servers

There is a section in the documentation dedicated to that.

cher-nov commented 5 years ago

There is a section in the documentation dedicated to that.

Yep, but I would propose to add this with server IPs into the library directly.

Lonami commented 5 years ago
message = client.get_messages(...)
print(message.text)  # hello **world**
...
client.parse_mode = 'html'
print(message.text)  # hello <strong>world</strong>

The fact this works the way it does is really confusing. message and client are two different things, yet changing one affects the other. The best trade-off is probably offering text for markdown-formatted text (the "original" text typed in the applications), raw_text for the raw text (the text raw, without any entities in it), and html_text for the HTML-formatted text.

painor commented 5 years ago

I think all entities and objects should be patched to have their own methods. client.get_entity("group").kick_user("use") just like messages. this should probably work for both input version and the full one.

Lonami commented 5 years ago

This is not a breaking change and can be done in the 1.x series. 2.0 is only about breaking changes.

painor commented 5 years ago

Also what about removing the aggressive attribute from the client.iter_participants?

Lonami commented 5 years ago

As a bonus (via @tulir on @TelethonChat/150284), having the core make use of Sans I/O could be a good idea.

Lonami commented 5 years ago

Just a thought I don't want to be lost, to make MTProto Proxies easier to use, we probably could/should allow the user to input them in the form of https://t.me/proxy?server=...&port=...&secret=.... This is a standard way and official clients also react to those links, so it makes sense if the library could parse them as well.

penn5 commented 5 years ago

catch_up needs fixing for channels

Lonami commented 5 years ago

@penn5 see https://github.com/LonamiWebs/Telethon/issues/1169#issuecomment-518037677, and #1125 to fix catch up.

apepenkov commented 4 years ago

It would be nice if we could get cached info (username, name, phone number for users) about entities from .session via some friendly method. Currently to do that you have to either cache yourself, or use bot.session._cursor()

iamwilliamli commented 4 years ago

It would be nice to rename document sent in channels with telethon

apepenkov commented 4 years ago

@lichengqi0805, what do you mean by that?

iamwilliamli commented 4 years ago

@lichengqi0805, what do you mean by that?

There is a bot @HK_rename_BOT ,which can change the document’s (such as pdf) without saving. I’m curious if there’s an api in Telegram which could achieve this?

apepenkov commented 4 years ago

I'm pretty sure they are saving. I was invistigating this, and Lonami told you can't edit attributes of file, if you are not uploading it. But if you somehow can, that's possible with telethon

Upd: So I've checked this bot, and they are saving.

iamwilliamli commented 4 years ago

I'm pretty sure they are saving. I was invistigating this, and Lonami told you can't edit attributes of file, if you are not uploading it. But if you somehow can, that's possible with telethon

Upd: So I've checked this bot, and they are saving.

Okay, thank you!

Skyross commented 3 years ago

@Lonami Is it possible to revive the use_cache flag/attribute at the upload_file method, have any ideas? I would like to contribute (including some 3rd party session persistence packages) but want to find out your vision about it.

Lonami commented 3 years ago

New additions can be added any time, not just between versions. This issue is about breaking changes to clean stuff up. That said, I do not want to add upload cache back. I think it would be taking the "library commodities" too far. It adds more maintenance burden, more ways in which it can break and cause confusion, more hidden costs, more data that would need to needlessly be stored in the session…

NavruzbekNoraliev commented 3 years ago

Would be a great feature if a bot could add users to groups, and channels! as far as i know it is done using a single user, which programmatically adds users to particular group

Lonami commented 3 years ago

This is not a breaking change, and can't be done regardless, because it's an API limitation.

Lonami commented 3 years ago

Yes. The format is prone to change and I am not willing to maintain that, so it should be elsewhere.

penn5 commented 3 years ago

Immutable types for TL stuff

penn5 commented 3 years ago

Get rid of signed (marked) peer IDs

penn5 commented 3 years ago

Get rid of entity cache

Lonami commented 3 years ago

Immutable types for TL stuff

Could you elaborate further?

Get rid of signed (marked) peer IDs

Simply agreed.

Get rid of entity cache

This would be far too big of a breaking change even for a release whose purpose is breaking backward-compatibility. In order for this to be achievable, we need a workable alternative. Do you have any suggestions?

penn5 commented 3 years ago

Immutable types for TL stuff

Make all TL constructors and functions immutable, with a .copy() mutator

Do you have any suggestions?

Provide a helper to pack any peer into a three-tuple of type, ID and hash. Of course get_entity would have to accept this

Lonami commented 3 years ago

Make all TL constructors and functions immutable, with a .copy() mutator

What benefits does bring? The library does make use of mutation in several places.

Provide a helper to pack any peer into a three-tuple of type, ID and hash. Of course get_entity would have to accept this

This is nowhere near as convenient as just using a number though, and people who were storing just numbers will need to do quite a bit of work to get the new system working.

penn5 commented 3 years ago

What benefits does bring? The library does make use of mutation in several places.

Cleaner code

This is nowhere near as convenient as just using a number though, and people who were storing just numbers will need to do quite a bit of work to get the new system working.

Yes. But it's cleaner and more explicit.

Lonami commented 3 years ago

Cleaner code

Do you have any concrete examples?

penn5 commented 3 years ago

Do you have any concrete examples?

Having pure data-holders being mutable makes for ugly and dangerous code.

Lonami commented 3 years ago

Having pure data-holders being mutable makes for ugly and dangerous code.

This isn't a concrete example. For instance, mutation is useful when using requests to iterate over something, such as messages, since the offset can simply be incremented rather than having to recreate the entire request.

Yes. But it's cleaner and more explicit.

Also, regarding no cache, how would mentions in messages work through tg://user?id=...?

apepenkov commented 3 years ago

I think entity cache is a nice thing, and I don't really see a reason to remove it. Personal opinion. I'd like to see functions for fetching entity cache, currently I do that via private cursor field, and that's not really a nice solution.

Lonami commented 3 years ago

The entity cache must remain in some way because it is The way the library knows if it needs to call getDifference to obtain an access hash. Furthermore, some places really only have access to just the identifier, such as mentions, and certain message service updates.

However, the automatic cache of full name, username and phone number is probably a bit of a stretch, and that can probably be removed. There's not really a way to directly query or access these anyway, which probably means it's best left for user code to deal with if they care about it.

penn5 commented 3 years ago

This isn't a concrete example. For instance, mutation is useful when using requests to iterate over something, such as messages, since the offset can simply be incremented rather than having to recreate the entire request.

That's easy to work around, while immutable types makes a lot of logic easier and safer. But we can drop this in favour of #3158 which isn't a breaking change.

Also, regarding no cache, how would mentions in messages work through tg://user?id=...?

Which brings me perfectly to my next point. Drop all HTML and Markdown support.

Lonami commented 3 years ago

Drop all HTML and Markdown support.

And what would the alternative to that be? Dealing with MessageEntity by hand is messy, combined with the fact offset and length work in an awkward way.

penn5 commented 3 years ago

I'll make a PoC

Lonami commented 3 years ago

My plan was to have proper commonmark support by default while keeping the old one for a bit longer for old code, along with making the Message.text not depend on the client.parse_mode.

Lonami commented 3 years ago

dataclasses are Python 3.7 onward, and Python 3.6 is not yet EOL (and even once it is, I will probably keep support for it around for longer).

I can't really see how a pseudo-DSL to format messages is any better than proper markdown. The only real motivation I can see for removing markdown is "less bloat", but even then, markdown support can hardly be considered bloat on a library whose main purpose is primarly receiving and sending messages to Telegram… You will need a far better argument to convince me to break so much existing code unnecessarily.

penn5 commented 3 years ago

Markdown is incredibly easy to get wrong and incredibly hard to get right. A simple DSL (just over 2kiB) is almost impossible to get wrong, and very easy to get right.

penn5 commented 3 years ago
import operator
import typing

import telethon.extensions

def tl_copy(self, **kwargs):
    r = telethon.extensions.BinaryReader(bytes(self)).tgread_object()
    for k, v in kwargs.items():
        setattr(r, k, v)
    return r

telethon.tl.TLObject.copy_ = tl_copy

OFFSET_KEY = operator.attrgetter("offset")

class Message:
    # TODO offsets and lengths are based on glyphs rather than codepoints, so we need a proper unicode library
    text: str
    entities: list[telethon.types.TypeMessageEntity]

    @typing.overload
    def __init__(self, message: str, entities: list[telethon.types.TypeMessageEntity] = None):
        ...

    @typing.overload
    def __init__(self, message: "Message", entities: list[telethon.types.TypeMessageEntity] = None):
        ...

    def __init__(self, message, entities=None):
        if entities is None:
            entities = []
        if isinstance(message, Message):
            self.text = message.text
            self.entities = (message.entities + entities)
            self.entities.sort(key=OFFSET_KEY)
        else:
            self.text = message
            self.entities = sorted(entities, key=OFFSET_KEY)

    def __add__(self, other: "MessageLike"):
        offset = len(self.text)
        return Message(self.text + (other.text if isinstance(other, Message) else other), self.entities + [entity.copy_(offset=entity.offset + offset) for entity in other.entities] if isinstance(other, Message) else self.entities)

    def __radd__(self, other: "MessageLike"):
        offset = len(other)
        return Message(other + self.text, [entity.copy_(offset=entity.offset + offset) for entity in self.entities])

    def __repr__(self):
        return telethon.utils.html.unparse(self.text, self.entities)

    def __len__(self):
        return len(self.text)

MessageLike = typing.Union[Message, str]

def text(s: str):
    return Message(s, [])

def mono(s: MessageLike):
    return Message(s, [telethon.types.MessageEntityCode(0, len(s))])

def link(s: MessageLike, t: str):
    return Message(s, [telethon.types.MessageEntityTextUrl(0, len(s), t)])

def bold(s: MessageLike):
    return Message(s, [telethon.types.MessageEntityBold(0, len(s))])

def italics(s: MessageLike):
    return Message(s, [telethon.types.MessageEntityItalic(0, len(s))])
penn5 commented 3 years ago

Usage:

print("hello " + mono("world") + " " + link("duck", "duck.com") + " " + italics("italics " + bold("italics bold")))

Output:

hello <code>world</code> <a href="duck.com">duck</a> <em>italics <strong>italics bold</strong></em>
Lonami commented 3 years ago

Markdown is incredibly easy to get wrong and incredibly hard to get right.

This argument is not strong enough to justify getting rid of markdown support entirely. Commonmark is well-defined (more than the original markdown specification, anyway), and the expected output is generally what one would expect.

This isn't to say we couldn't add your DSL as an alternative. But markdown support isn't going anywhere for the time being.

penn5 commented 3 years ago

This argument is not strong enough to justify getting rid of markdown support entirely. Commonmark is well-defined (more than the original markdown specification, anyway), and the expected output is generally what one would expect.

This isn't to say we couldn't add your DSL as an alternative. But markdown support isn't going anywhere for the time being.

That's reasonable. Maybe it could go away in 2.0 after a migration period, idk

penn5 commented 2 years ago

Btw, can we have classes like SessionState made into dataclasses please?

penn5 commented 2 years ago

how should testmode be enabled on v2?