2.0 wishlist - Githubissues

Lonami commented 5 years ago

Telethon currently has some weird things in it that need changing, but would be breaking changes. Therefore, a new major release should be made. We should aim for making a single release with the biggest amount of breaking changes, instead of making breaking changes across many releases.

Please post in this issue any gripe you have with the library and that you would like it to change.

Of course, a last release 1.X will be made with deprecation warning on all these methods, so people know how to upgrade.

(2021-09 update: probably no "last 1.X" with deprecation; instead a document will be prepared, along with some helper code to ease the migration.)

Lonami commented 5 years ago

[ ] Profile photos: It no longer makes sense to have a download_big parameter, since now more sizes are available. The parameter should be changed to size for more flexibility.
[x] send_read_acknowledge. Horribly long name. mark_read would work much better (and perhaps the Message should have this, too). https://github.com/LonamiWebs/Telethon/commit/f6f7345a3a5a3aaeec6b05b18a70b1fb993e25d1
[x] File cache. It works bad, because it's not smart enough. Another thing to consider is, is it something the library should even be doing? Caching entities is okay because it's necessary for the library to work. Caching files is another story. Instead, we should offer a way to "save" files in no particular chat and return a file_id, once we figure out how to make those persistent. https://github.com/LonamiWebs/Telethon/commit/78971fd2e595288891f49fb367a8677b2e519867
[ ] Entities. Should we keep calling them that? What about the parameter names? Should we use chat or peer everywhere? But, at least, we should be consistent.
[x] sign_in. Why does it send the code automatically? That's not its job. https://github.com/LonamiWebs/Telethon/commit/9bafcdfe0fd7608224116047f1ed04fa5739972b
[x] is_connected. Should be a property, and maybe renamed. https://github.com/LonamiWebs/Telethon/commit/6226fa95ce9b2bc55a5b684f8d42b53220310fd6
[ ] message.video. It returns round videos too, but audio does not return voice notes. It's not consistent.
[ ] Public vs. private. Why is api_id public in the TelegramClient? Things like exposing the session would be better as read-only properties, too. All classes, functions (like utils) and modules (their names) should be reviewed. This includes update._client which should be update.client_, and other cases. https://github.com/LonamiWebs/Telethon/commit/80e86e98ff04128c32f9fd0b04a0c7472c0ab15e
[ ] Session files. Address #902.
[ ] events.NewMessage. It should just be a Message to avoid confusion.
[ ] Full API. Using it is a bit annoying. It would be nice if we could do client.raw.send_message.
[ ] download_file. Why does it return the str type of what? download_media and download_file should be unified in one, and None should mean "infer filename", while bytes mean save to in-memory bytes (breaks download_file).
[ ] Sending files. It's a mess, including progress callbacks, and changing attributes like mime or name. It's not even possible to "force" sending as photo.
[ ] message.download_media. We already have a method in the client. The only thing that would make sense is message.file.download().
[ ] edit_message. It supports far too many confusing combinations.
[ ] Connection retries. It would be helpful to support things like "a list of timeouts", so that they are more configurable. For example, retries=[1, 2, 4, 8] would retry 5 times sleeping 1, 2, 4 and 8 seconds between. People could provide any generator that they like.
[ ] Update handling. It needs an overhaul and to properly follow https://core.telegram.org/api/updates.
[ ] telethon.sync. The fact it rewrites the original classes irreversibly is not really good. It would be helpful if it created proxy objects and, for all the methods that return something we can await, automatically await it. Perhaps offer some public method to sync-ify other things, too. But then it wouldn't really belong in Telethon since it would be generic. Or perhaps this whole sync hack should be removed, since it messes with IDEs and type hinting a lot.
[x] buttons=. Currently, a list makes one button per column. Ideally, it would make one button per row, so people can trivially buttons=list for one-per-row and buttons=[list] for one-per-column. https://github.com/LonamiWebs/Telethon/commit/ad37db1cd626683ee5fbcc74c49946972bcb4c71
[x] Proxy. pysocks looks dead, and should be replaced. https://github.com/LonamiWebs/Telethon/commit/ad7e62baf3872d1d5d7e50e1857307166e0dbd04
[x] client.send_file. It accepts things like captions and buttons, and also sending more than 10 files, which will be sent as albums. But if you mix photos and documents the albums will get sent first and then the files. This is weird and makes #1204 harder. It should just work with up to 10. How are buttons, for example, supposed to be split across calls? https://github.com/LonamiWebs/Telethon/commit/f8137595c55f13ed5a2f67f451ff1ab0475d2b04 https://github.com/LonamiWebs/Telethon/commit/6d4c8ba8ffbb2c334c414abd4df45e10426a061f
[ ] client.disconnected. It is a bad property name if we want to have a client.connected property, because the former returns a future and the latter a boolean.
[ ] Return types, such as the one from delete_messages are a bit… random and not very useful.
[ ] message.edit() won't edit incoming messages, but this is fine to do in broadcast channels. Changing it would technically be a breaking change though.
Expose less of raw API. Things like filters for get_participants should not make you import random types, which is very error-prone.
[ ] with should not start(), just handle connect() and disconnect(). The sync-context should probably be removed as well.
[ ] iter_ and get_ duality might not be necessary, since one can implement both __await__ and __aiter__ in the same object.
[ ] Python 3.5 support could be dropped, since it's EOL (see also End-of-life branches).
[ ] Get rid of bot-API style file IDs. Maintaining them is beyond the scope of this project.
[ ] Consider using https://github.com/agronholm/anyio.

DaveScream commented 5 years ago

That would be cool to have function that will upload file_list in multithread and return media_ids

DaveScream commented 5 years ago

Sometimes send_file takes attributes=single item, sometimes it warns and say that it should be LIST

Lonami commented 5 years ago

@DaveScream you can achieve that with the facilities asyncio offers (create_task, wait or gather). Telethon should only offer a way if it can be optimized anyhow (for example, forwarding more than one message at once), and not add unnecessary clutter.

cher-nov commented 5 years ago

Telethon networking core needs a throughout refactoring to make abstractions more strict (remember our clumsy implementation of MTProxy'es). I've tried to implement it by myself in March-April, but failed (the whole library simple refused to work correctly after this) since I'm lack of deep knowlenge of current Telethon code. So I'll just write a concept here.

Currently we have an architecture very bound to the existing Telegram protocols and servers. We also don't properly distinguish between socket proxies and MTProxy in the TelegramClient constructor while we should. We also have a eerie pieces of Authenticator and MTProtoPlainSender which are used only once and only by MTProtoSender. And we also force user to check the dd prefix in a secret (about which it should not be aware) to choose between ConnectionTcpMTProxyRandomizedIntermediate and ConnectionTcpMTProxyIntermediate or even ConnectionTcpMTProxyAbridged.

I propose the next structure:

protocol, which is MTProto 2.0 actually (or any user-defined protocol);
transport, which is 'connection mode' plus support of MTProxy;
connection, which is a wrapper over socket or UART or radiotelescope etc, with support of proxying through aiosocks.

  protocol/
    mtproto10.py
    mtproto20.py
  transport/
    tcpobfuscated.py
    ...
  connection/
    asyncsocket.py

This will bring to us the next additional abilities: 1) Use MTProxy over socket proxy to make life of DPI systems even harder. 1) Connect to custom servers with custom protocols.

It would be also nice to add support of test servers like Pyrogram does. And I also pretty sure that we should keep the sync solution since it simplifies things a lot in easy use-cases.

Lonami commented 5 years ago

It would be also nice to add support of test servers

There is a section in the documentation dedicated to that.

cher-nov commented 5 years ago

There is a section in the documentation dedicated to that.

Yep, but I would propose to add this with server IPs into the library directly.

Lonami commented 5 years ago

message = client.get_messages(...)
print(message.text)  # hello **world**
...
client.parse_mode = 'html'
print(message.text)  # hello <strong>world</strong>

The fact this works the way it does is really confusing. message and client are two different things, yet changing one affects the other. The best trade-off is probably offering text for markdown-formatted text (the "original" text typed in the applications), raw_text for the raw text (the text raw, without any entities in it), and html_text for the HTML-formatted text.

painor commented 5 years ago

I think all entities and objects should be patched to have their own methods. client.get_entity("group").kick_user("use") just like messages. this should probably work for both input version and the full one.

Lonami commented 5 years ago

This is not a breaking change and can be done in the 1.x series. 2.0 is only about breaking changes.

painor commented 5 years ago

Also what about removing the aggressive attribute from the client.iter_participants?

Lonami commented 5 years ago

As a bonus (via @tulir on @TelethonChat/150284), having the core make use of Sans I/O could be a good idea.

Lonami commented 5 years ago

Just a thought I don't want to be lost, to make MTProto Proxies easier to use, we probably could/should allow the user to input them in the form of https://t.me/proxy?server=...&port=...&secret=.... This is a standard way and official clients also react to those links, so it makes sense if the library could parse them as well.

penn5 commented 5 years ago

catch_up needs fixing for channels

Lonami commented 5 years ago

@penn5 see https://github.com/LonamiWebs/Telethon/issues/1169#issuecomment-518037677, and #1125 to fix catch up.

apepenkov commented 4 years ago

It would be nice if we could get cached info (username, name, phone number for users) about entities from .session via some friendly method. Currently to do that you have to either cache yourself, or use bot.session._cursor()

iamwilliamli commented 4 years ago

It would be nice to rename document sent in channels with telethon

apepenkov commented 4 years ago

@lichengqi0805, what do you mean by that?

iamwilliamli commented 4 years ago

@lichengqi0805, what do you mean by that?

There is a bot @HK_rename_BOT ,which can change the document’s (such as pdf) without saving. I’m curious if there’s an api in Telegram which could achieve this?

apepenkov commented 4 years ago

I'm pretty sure they are saving. I was invistigating this, and Lonami told you can't edit attributes of file, if you are not uploading it. But if you somehow can, that's possible with telethon

Upd: So I've checked this bot, and they are saving.

iamwilliamli commented 4 years ago

I'm pretty sure they are saving. I was invistigating this, and Lonami told you can't edit attributes of file, if you are not uploading it. But if you somehow can, that's possible with telethon

Upd: So I've checked this bot, and they are saving.

Okay, thank you!

Skyross commented 3 years ago

@Lonami Is it possible to revive the use_cache flag/attribute at the upload_file method, have any ideas? I would like to contribute (including some 3rd party session persistence packages) but want to find out your vision about it.

Lonami commented 3 years ago

New additions can be added any time, not just between versions. This issue is about breaking changes to clean stuff up. That said, I do not want to add upload cache back. I think it would be taking the "library commodities" too far. It adds more maintenance burden, more ways in which it can break and cause confusion, more hidden costs, more data that would need to needlessly be stored in the session…

NavruzbekNoraliev commented 3 years ago

Would be a great feature if a bot could add users to groups, and channels! as far as i know it is done using a single user, which programmatically adds users to particular group

Lonami commented 3 years ago

This is not a breaking change, and can't be done regardless, because it's an API limitation.

Lonami commented 3 years ago

Yes. The format is prone to change and I am not willing to maintain that, so it should be elsewhere.

penn5 commented 3 years ago

Immutable types for TL stuff

penn5 commented 3 years ago

Get rid of signed (marked) peer IDs

penn5 commented 3 years ago

Get rid of entity cache

Lonami commented 3 years ago

Immutable types for TL stuff

Could you elaborate further?

Get rid of signed (marked) peer IDs

Simply agreed.

Get rid of entity cache

This would be far too big of a breaking change even for a release whose purpose is breaking backward-compatibility. In order for this to be achievable, we need a workable alternative. Do you have any suggestions?

penn5 commented 3 years ago

Immutable types for TL stuff

Make all TL constructors and functions immutable, with a .copy() mutator

Do you have any suggestions?

Provide a helper to pack any peer into a three-tuple of type, ID and hash. Of course get_entity would have to accept this

Lonami commented 3 years ago

Make all TL constructors and functions immutable, with a .copy() mutator

What benefits does bring? The library does make use of mutation in several places.

Provide a helper to pack any peer into a three-tuple of type, ID and hash. Of course get_entity would have to accept this

This is nowhere near as convenient as just using a number though, and people who were storing just numbers will need to do quite a bit of work to get the new system working.

penn5 commented 3 years ago

What benefits does bring? The library does make use of mutation in several places.

Cleaner code

This is nowhere near as convenient as just using a number though, and people who were storing just numbers will need to do quite a bit of work to get the new system working.

Yes. But it's cleaner and more explicit.

Lonami commented 3 years ago

Cleaner code

Do you have any concrete examples?

penn5 commented 3 years ago

Do you have any concrete examples?

Having pure data-holders being mutable makes for ugly and dangerous code.

Lonami commented 3 years ago

Having pure data-holders being mutable makes for ugly and dangerous code.

This isn't a concrete example. For instance, mutation is useful when using requests to iterate over something, such as messages, since the offset can simply be incremented rather than having to recreate the entire request.

Yes. But it's cleaner and more explicit.

Also, regarding no cache, how would mentions in messages work through tg://user?id=...?

apepenkov commented 3 years ago

I think entity cache is a nice thing, and I don't really see a reason to remove it. Personal opinion. I'd like to see functions for fetching entity cache, currently I do that via private cursor field, and that's not really a nice solution.

Lonami commented 3 years ago

The entity cache must remain in some way because it is The way the library knows if it needs to call getDifference to obtain an access hash. Furthermore, some places really only have access to just the identifier, such as mentions, and certain message service updates.

However, the automatic cache of full name, username and phone number is probably a bit of a stretch, and that can probably be removed. There's not really a way to directly query or access these anyway, which probably means it's best left for user code to deal with if they care about it.

penn5 commented 3 years ago

This isn't a concrete example. For instance, mutation is useful when using requests to iterate over something, such as messages, since the offset can simply be incremented rather than having to recreate the entire request.

That's easy to work around, while immutable types makes a lot of logic easier and safer. But we can drop this in favour of #3158 which isn't a breaking change.

Also, regarding no cache, how would mentions in messages work through tg://user?id=...?

Which brings me perfectly to my next point. Drop all HTML and Markdown support.

Lonami commented 3 years ago

Drop all HTML and Markdown support.

And what would the alternative to that be? Dealing with MessageEntity by hand is messy, combined with the fact offset and length work in an awkward way.

penn5 commented 3 years ago

I'll make a PoC

Lonami commented 3 years ago

My plan was to have proper commonmark support by default while keeping the old one for a bit longer for old code, along with making the Message.text not depend on the client.parse_mode.

Lonami commented 3 years ago

dataclasses are Python 3.7 onward, and Python 3.6 is not yet EOL (and even once it is, I will probably keep support for it around for longer).

I can't really see how a pseudo-DSL to format messages is any better than proper markdown. The only real motivation I can see for removing markdown is "less bloat", but even then, markdown support can hardly be considered bloat on a library whose main purpose is primarly receiving and sending messages to Telegram… You will need a far better argument to convince me to break so much existing code unnecessarily.

penn5 commented 3 years ago

Markdown is incredibly easy to get wrong and incredibly hard to get right. A simple DSL (just over 2kiB) is almost impossible to get wrong, and very easy to get right.

penn5 commented 3 years ago

import operator
import typing

import telethon.extensions

def tl_copy(self, **kwargs):
    r = telethon.extensions.BinaryReader(bytes(self)).tgread_object()
    for k, v in kwargs.items():
        setattr(r, k, v)
    return r

telethon.tl.TLObject.copy_ = tl_copy

OFFSET_KEY = operator.attrgetter("offset")

class Message:
    # TODO offsets and lengths are based on glyphs rather than codepoints, so we need a proper unicode library
    text: str
    entities: list[telethon.types.TypeMessageEntity]

    @typing.overload
    def __init__(self, message: str, entities: list[telethon.types.TypeMessageEntity] = None):
        ...

    @typing.overload
    def __init__(self, message: "Message", entities: list[telethon.types.TypeMessageEntity] = None):
        ...

    def __init__(self, message, entities=None):
        if entities is None:
            entities = []
        if isinstance(message, Message):
            self.text = message.text
            self.entities = (message.entities + entities)
            self.entities.sort(key=OFFSET_KEY)
        else:
            self.text = message
            self.entities = sorted(entities, key=OFFSET_KEY)

    def __add__(self, other: "MessageLike"):
        offset = len(self.text)
        return Message(self.text + (other.text if isinstance(other, Message) else other), self.entities + [entity.copy_(offset=entity.offset + offset) for entity in other.entities] if isinstance(other, Message) else self.entities)

    def __radd__(self, other: "MessageLike"):
        offset = len(other)
        return Message(other + self.text, [entity.copy_(offset=entity.offset + offset) for entity in self.entities])

    def __repr__(self):
        return telethon.utils.html.unparse(self.text, self.entities)

    def __len__(self):
        return len(self.text)

MessageLike = typing.Union[Message, str]

def text(s: str):
    return Message(s, [])

def mono(s: MessageLike):
    return Message(s, [telethon.types.MessageEntityCode(0, len(s))])

def link(s: MessageLike, t: str):
    return Message(s, [telethon.types.MessageEntityTextUrl(0, len(s), t)])

def bold(s: MessageLike):
    return Message(s, [telethon.types.MessageEntityBold(0, len(s))])

def italics(s: MessageLike):
    return Message(s, [telethon.types.MessageEntityItalic(0, len(s))])

penn5 commented 3 years ago

Usage:

print("hello " + mono("world") + " " + link("duck", "duck.com") + " " + italics("italics " + bold("italics bold")))

Output:

hello <code>world</code> <a href="duck.com">duck</a> <em>italics <strong>italics bold</strong></em>

Lonami commented 3 years ago

Markdown is incredibly easy to get wrong and incredibly hard to get right.

This argument is not strong enough to justify getting rid of markdown support entirely. Commonmark is well-defined (more than the original markdown specification, anyway), and the expected output is generally what one would expect.

This isn't to say we couldn't add your DSL as an alternative. But markdown support isn't going anywhere for the time being.

penn5 commented 3 years ago

This argument is not strong enough to justify getting rid of markdown support entirely. Commonmark is well-defined (more than the original markdown specification, anyway), and the expected output is generally what one would expect.

This isn't to say we couldn't add your DSL as an alternative. But markdown support isn't going anywhere for the time being.

That's reasonable. Maybe it could go away in 2.0 after a migration period, idk

penn5 commented 2 years ago

Btw, can we have classes like SessionState made into dataclasses please?

penn5 commented 2 years ago

how should testmode be enabled on v2?

LonamiWebs / Telethon

2.0 wishlist #1169