Closed Lonami closed 2 years ago
download_big
parameter, since now more sizes are available. The parameter should be changed to size
for more flexibility.send_read_acknowledge
. Horribly long name. mark_read
would work much better (and perhaps the Message
should have this, too). https://github.com/LonamiWebs/Telethon/commit/f6f7345a3a5a3aaeec6b05b18a70b1fb993e25d1file_id
, once we figure out how to make those persistent. https://github.com/LonamiWebs/Telethon/commit/78971fd2e595288891f49fb367a8677b2e519867chat
or peer
everywhere? But, at least, we should be consistent.sign_in
. Why does it send the code automatically? That's not its job. https://github.com/LonamiWebs/Telethon/commit/9bafcdfe0fd7608224116047f1ed04fa5739972bis_connected
. Should be a property, and maybe renamed. https://github.com/LonamiWebs/Telethon/commit/6226fa95ce9b2bc55a5b684f8d42b53220310fd6message.video
. It returns round videos too, but audio
does not return voice notes. It's not consistent.api_id
public in the TelegramClient
? Things like exposing the session
would be better as read-only properties, too. All classes, functions (like utils
) and modules (their names) should be reviewed. This includes update._client
which should be update.client_
, and other cases. https://github.com/LonamiWebs/Telethon/commit/80e86e98ff04128c32f9fd0b04a0c7472c0ab15eevents.NewMessage
. It should just be a Message
to avoid confusion.client.raw.send_message
.download_file
. Why does it return the str
type of what? download_media
and download_file
should be unified in one, and None
should mean "infer filename", while bytes
mean save to in-memory bytes
(breaks download_file
).message.download_media
. We already have a method in the client. The only thing that would make sense is message.file.download()
.edit_message
. It supports far too many confusing combinations.retries=[1, 2, 4, 8]
would retry 5 times sleeping 1, 2, 4 and 8 seconds between. People could provide any generator that they like.await
, automatically await
it. Perhaps offer some public method to sync
-ify other things, too. But then it wouldn't really belong in Telethon since it would be generic. Or perhaps this whole sync
hack should be removed, since it messes with IDEs and type hinting a lot.buttons=list
for one-per-row and buttons=[list]
for one-per-column. https://github.com/LonamiWebs/Telethon/commit/ad37db1cd626683ee5fbcc74c49946972bcb4c71pysocks
looks dead, and should be replaced. https://github.com/LonamiWebs/Telethon/commit/ad7e62baf3872d1d5d7e50e1857307166e0dbd04client.send_file
. It accepts things like captions and buttons, and also sending more than 10 files, which will be sent as albums. But if you mix photos and documents the albums will get sent first and then the files. This is weird and makes #1204 harder. It should just work with up to 10. How are buttons, for example, supposed to be split across calls? https://github.com/LonamiWebs/Telethon/commit/f8137595c55f13ed5a2f67f451ff1ab0475d2b04 https://github.com/LonamiWebs/Telethon/commit/6d4c8ba8ffbb2c334c414abd4df45e10426a061fclient.disconnected
. It is a bad property name if we want to have a client.connected
property, because the former returns a future and the latter a boolean.delete_messages
are a bit… random and not very useful.get_participants
should not make you import random types, which is very error-prone.with
should not start()
, just handle connect()
and disconnect()
. The sync-context should probably be removed as well.iter_
and get_
duality might not be necessary, since one can implement both __await__
and __aiter__
in the same object.That would be cool to have function that will upload file_list in multithread and return media_ids
Sometimes send_file takes attributes=single item, sometimes it warns and say that it should be LIST
@DaveScream you can achieve that with the facilities asyncio
offers (create_task
, wait
or gather
). Telethon should only offer a way if it can be optimized anyhow (for example, forwarding more than one message at once), and not add unnecessary clutter.
Telethon networking core needs a throughout refactoring to make abstractions more strict (remember our clumsy implementation of MTProxy'es). I've tried to implement it by myself in March-April, but failed (the whole library simple refused to work correctly after this) since I'm lack of deep knowlenge of current Telethon code. So I'll just write a concept here.
Currently we have an architecture very bound to the existing Telegram protocols and servers. We also don't properly distinguish between socket proxies and MTProxy in the TelegramClient
constructor while we should. We also have a eerie pieces of Authenticator
and MTProtoPlainSender
which are used only once and only by MTProtoSender
. And we also force user to check the dd
prefix in a secret (about which it should not be aware) to choose between ConnectionTcpMTProxyRandomizedIntermediate
and ConnectionTcpMTProxyIntermediate
or even ConnectionTcpMTProxyAbridged
.
I propose the next structure:
protocol
, which is MTProto 2.0 actually (or any user-defined protocol);transport
, which is 'connection mode' plus support of MTProxy;connection
, which is a wrapper over socket or UART or radiotelescope etc, with support of proxying through aiosocks
. protocol/
mtproto10.py
mtproto20.py
transport/
tcpobfuscated.py
...
connection/
asyncsocket.py
This will bring to us the next additional abilities: 1) Use MTProxy over socket proxy to make life of DPI systems even harder. 1) Connect to custom servers with custom protocols.
It would be also nice to add support of test servers like Pyrogram does.
And I also pretty sure that we should keep the sync
solution since it simplifies things a lot in easy use-cases.
It would be also nice to add support of test servers
Yep, but I would propose to add this with server IPs into the library directly.
message = client.get_messages(...)
print(message.text) # hello **world**
...
client.parse_mode = 'html'
print(message.text) # hello <strong>world</strong>
The fact this works the way it does is really confusing. message
and client
are two different things, yet changing one affects the other. The best trade-off is probably offering text
for markdown-formatted text (the "original" text typed in the applications), raw_text
for the raw text (the text raw, without any entities in it), and html_text
for the HTML-formatted text.
I think all entities and objects should be patched to have their own methods. client.get_entity("group").kick_user("use") just like messages. this should probably work for both input version and the full one.
This is not a breaking change and can be done in the 1.x series. 2.0 is only about breaking changes.
Also what about removing the aggressive attribute from the client.iter_participants?
As a bonus (via @tulir on @TelethonChat/150284), having the core make use of Sans I/O could be a good idea.
Just a thought I don't want to be lost, to make MTProto Proxies easier to use, we probably could/should allow the user to input them in the form of https://t.me/proxy?server=...&port=...&secret=...
. This is a standard way and official clients also react to those links, so it makes sense if the library could parse them as well.
catch_up
needs fixing for channels
@penn5 see https://github.com/LonamiWebs/Telethon/issues/1169#issuecomment-518037677, and #1125 to fix catch up.
It would be nice if we could get cached info (username, name, phone number for users) about entities from .session via some friendly method. Currently to do that you have to either cache yourself, or use bot.session._cursor()
It would be nice to rename document sent in channels with telethon
@lichengqi0805, what do you mean by that?
@lichengqi0805, what do you mean by that?
There is a bot @HK_rename_BOT ,which can change the document’s (such as pdf) without saving. I’m curious if there’s an api in Telegram which could achieve this?
I'm pretty sure they are saving. I was invistigating this, and Lonami told you can't edit attributes of file, if you are not uploading it. But if you somehow can, that's possible with telethon
Upd: So I've checked this bot, and they are saving.
I'm pretty sure they are saving. I was invistigating this, and Lonami told you can't edit attributes of file, if you are not uploading it. But if you somehow can, that's possible with telethon
Upd: So I've checked this bot, and they are saving.
Okay, thank you!
@Lonami Is it possible to revive the use_cache
flag/attribute at the upload_file
method, have any ideas? I would like to contribute (including some 3rd party session persistence packages) but want to find out your vision about it.
New additions can be added any time, not just between versions. This issue is about breaking changes to clean stuff up. That said, I do not want to add upload cache back. I think it would be taking the "library commodities" too far. It adds more maintenance burden, more ways in which it can break and cause confusion, more hidden costs, more data that would need to needlessly be stored in the session…
Would be a great feature if a bot could add users to groups, and channels! as far as i know it is done using a single user, which programmatically adds users to particular group
This is not a breaking change, and can't be done regardless, because it's an API limitation.
Yes. The format is prone to change and I am not willing to maintain that, so it should be elsewhere.
Immutable types for TL stuff
Get rid of signed (marked) peer IDs
Get rid of entity cache
Immutable types for TL stuff
Could you elaborate further?
Get rid of signed (marked) peer IDs
Simply agreed.
Get rid of entity cache
This would be far too big of a breaking change even for a release whose purpose is breaking backward-compatibility. In order for this to be achievable, we need a workable alternative. Do you have any suggestions?
Immutable types for TL stuff
Make all TL constructors and functions immutable, with a .copy() mutator
Do you have any suggestions?
Provide a helper to pack any peer into a three-tuple of type, ID and hash. Of course get_entity would have to accept this
Make all TL constructors and functions immutable, with a .copy() mutator
What benefits does bring? The library does make use of mutation in several places.
Provide a helper to pack any peer into a three-tuple of type, ID and hash. Of course get_entity would have to accept this
This is nowhere near as convenient as just using a number though, and people who were storing just numbers will need to do quite a bit of work to get the new system working.
What benefits does bring? The library does make use of mutation in several places.
Cleaner code
This is nowhere near as convenient as just using a number though, and people who were storing just numbers will need to do quite a bit of work to get the new system working.
Yes. But it's cleaner and more explicit.
Cleaner code
Do you have any concrete examples?
Do you have any concrete examples?
Having pure data-holders being mutable makes for ugly and dangerous code.
Having pure data-holders being mutable makes for ugly and dangerous code.
This isn't a concrete example. For instance, mutation is useful when using requests to iterate over something, such as messages, since the offset can simply be incremented rather than having to recreate the entire request.
Yes. But it's cleaner and more explicit.
Also, regarding no cache, how would mentions in messages work through tg://user?id=...
?
I think entity cache is a nice thing, and I don't really see a reason to remove it. Personal opinion. I'd like to see functions for fetching entity cache, currently I do that via private cursor field, and that's not really a nice solution.
The entity cache must remain in some way because it is The way the library knows if it needs to call getDifference to obtain an access hash. Furthermore, some places really only have access to just the identifier, such as mentions, and certain message service updates.
However, the automatic cache of full name, username and phone number is probably a bit of a stretch, and that can probably be removed. There's not really a way to directly query or access these anyway, which probably means it's best left for user code to deal with if they care about it.
This isn't a concrete example. For instance, mutation is useful when using requests to iterate over something, such as messages, since the offset can simply be incremented rather than having to recreate the entire request.
That's easy to work around, while immutable types makes a lot of logic easier and safer. But we can drop this in favour of #3158 which isn't a breaking change.
Also, regarding no cache, how would mentions in messages work through
tg://user?id=...
?
Which brings me perfectly to my next point. Drop all HTML and Markdown support.
Drop all HTML and Markdown support.
And what would the alternative to that be? Dealing with MessageEntity
by hand is messy, combined with the fact offset
and length
work in an awkward way.
I'll make a PoC
My plan was to have proper commonmark support by default while keeping the old one for a bit longer for old code, along with making the Message.text
not depend on the client.parse_mode
.
dataclasses
are Python 3.7 onward, and Python 3.6 is not yet EOL (and even once it is, I will probably keep support for it around for longer).
I can't really see how a pseudo-DSL to format messages is any better than proper markdown. The only real motivation I can see for removing markdown is "less bloat", but even then, markdown support can hardly be considered bloat on a library whose main purpose is primarly receiving and sending messages to Telegram… You will need a far better argument to convince me to break so much existing code unnecessarily.
Markdown is incredibly easy to get wrong and incredibly hard to get right. A simple DSL (just over 2kiB) is almost impossible to get wrong, and very easy to get right.
import operator
import typing
import telethon.extensions
def tl_copy(self, **kwargs):
r = telethon.extensions.BinaryReader(bytes(self)).tgread_object()
for k, v in kwargs.items():
setattr(r, k, v)
return r
telethon.tl.TLObject.copy_ = tl_copy
OFFSET_KEY = operator.attrgetter("offset")
class Message:
# TODO offsets and lengths are based on glyphs rather than codepoints, so we need a proper unicode library
text: str
entities: list[telethon.types.TypeMessageEntity]
@typing.overload
def __init__(self, message: str, entities: list[telethon.types.TypeMessageEntity] = None):
...
@typing.overload
def __init__(self, message: "Message", entities: list[telethon.types.TypeMessageEntity] = None):
...
def __init__(self, message, entities=None):
if entities is None:
entities = []
if isinstance(message, Message):
self.text = message.text
self.entities = (message.entities + entities)
self.entities.sort(key=OFFSET_KEY)
else:
self.text = message
self.entities = sorted(entities, key=OFFSET_KEY)
def __add__(self, other: "MessageLike"):
offset = len(self.text)
return Message(self.text + (other.text if isinstance(other, Message) else other), self.entities + [entity.copy_(offset=entity.offset + offset) for entity in other.entities] if isinstance(other, Message) else self.entities)
def __radd__(self, other: "MessageLike"):
offset = len(other)
return Message(other + self.text, [entity.copy_(offset=entity.offset + offset) for entity in self.entities])
def __repr__(self):
return telethon.utils.html.unparse(self.text, self.entities)
def __len__(self):
return len(self.text)
MessageLike = typing.Union[Message, str]
def text(s: str):
return Message(s, [])
def mono(s: MessageLike):
return Message(s, [telethon.types.MessageEntityCode(0, len(s))])
def link(s: MessageLike, t: str):
return Message(s, [telethon.types.MessageEntityTextUrl(0, len(s), t)])
def bold(s: MessageLike):
return Message(s, [telethon.types.MessageEntityBold(0, len(s))])
def italics(s: MessageLike):
return Message(s, [telethon.types.MessageEntityItalic(0, len(s))])
Usage:
print("hello " + mono("world") + " " + link("duck", "duck.com") + " " + italics("italics " + bold("italics bold")))
Output:
hello <code>world</code> <a href="duck.com">duck</a> <em>italics <strong>italics bold</strong></em>
Markdown is incredibly easy to get wrong and incredibly hard to get right.
This argument is not strong enough to justify getting rid of markdown support entirely. Commonmark is well-defined (more than the original markdown specification, anyway), and the expected output is generally what one would expect.
This isn't to say we couldn't add your DSL as an alternative. But markdown support isn't going anywhere for the time being.
This argument is not strong enough to justify getting rid of markdown support entirely. Commonmark is well-defined (more than the original markdown specification, anyway), and the expected output is generally what one would expect.
This isn't to say we couldn't add your DSL as an alternative. But markdown support isn't going anywhere for the time being.
That's reasonable. Maybe it could go away in 2.0 after a migration period, idk
Btw, can we have classes like SessionState made into dataclasses please?
how should testmode be enabled on v2?
Telethon currently has some weird things in it that need changing, but would be breaking changes. Therefore, a new major release should be made. We should aim for making a single release with the biggest amount of breaking changes, instead of making breaking changes across many releases.
Please post in this issue any gripe you have with the library and that you would like it to change.
Of course, a last release 1.X will be made with deprecation warning on all these methods, so people know how to upgrade.
(2021-09 update: probably no "last 1.X" with deprecation; instead a document will be prepared, along with some helper code to ease the migration.)