Open elenakrittik opened 8 months ago
This is definitely an interesting proposal, thanks!
The cost of using IDs in incorrect places can range from an unhandled exception up to damaging users' servers and, in very rare, but still real cases, introducing security holes (if the ID comes from user input).
I can't think of situations where a mistake like that could cause any damage, given that IDs are unique (barring a few exceptions), but it's rather annoying and difficult to debug nonetheless, yeah.
Change all raw
int
IDs to appropriate newtypes. This incurs zero (or close to zero) overhead, but allows to prevent incorrect usage of IDs.
Sounds reasonable.
I'm somewhat worried about the additional overhead in very common expressions/hot paths like self.id = MessageId(int(data["id"]))
. It's less in >=3.11 (https://github.com/python/cpython/commit/96c4cbd96c769e92869c62ba898dd9eb670baa81), but not fully negligible especially in earlier versions.
Compatibility is still to be estimated.
Compatibility is my main concern here, even if it's just in the type-checking realm, and not at runtime.
As far as I understand, only simple expressions like fetch_channel(channel.id)
would continue to work as-is, anything more advanced would likely require replacing annotations with the appropriate *Id
types everywhere; while reasonable in many cases, this can be a pretty big ask in larger projects.
The ideal compatible solution would be making methods accept something like ChannelId | int
instead of just ChannelId
. With the current typing.NewType
semantics however, that would still allow passing mismatching ID types.
I don't have any data on how often mix-ups between different ID "types" happen - I imagine it's probably not that frequent to warrant a fundamental change like this, personally. If this library was still in very early development, adopting typed IDs like this would be really cool. By now, preserving compatibility (when possible) is pretty essential, and there doesn't seem to be an obvious way to accomplish that here (wouldn't mind to be proven wrong about this, though).
I'm somewhat worried about the additional overhead in very common expressions/hot paths like
self.id = MessageId(int(data["id"]))
. It's less in >=3.11 (python/cpython@96c4cbd), but not fully negligible especially in earlier versions.
I think we can make an exception for those and # type: ignore
them, given that these IDs are coming from Discord and are likely unlikely to be invalid/wrong.
Wouldn't the identity function overhead be dwarfed by the int cast anyway?
Change all ID types from
int
to*Id
newtypes for greater type safety.
This is cool and all, but if they were to do this it would bring some really weird code. For example: everytime you want to search someone up you would have to ApplicationID(applicationid)
which could really flood the code. And if the ID doesn't exist it would already flag.
And you can make an error_handler using disnake.ext.commands.UserNotFound for example.
This proposal's intention is to allow optional, type-check-time guarantees about validness of ID usage. Whether to use it or not is up to you, and if you prefer catching errors at runtime instead, you will be able to do that.
(if we find a solution for the compatibility problem..)
There could be something like bot.get_member()
which could be member_id: int | UserID
There could be something like
bot.get_member()
which could bemember_id: int | UserID
The union would defeat the purpose of this issue, since NewType(..., int)
is a subtype of int
.
I'm somewhat worried about the additional overhead in very common expressions/hot paths like
self.id = MessageId(int(data["id"]))
.
int
call is redundant here, since MessageId
is its subclass. self.id = MessageId(data["id"])
is enough.
int
call is redundant here, sinceMessageId
is its subclass.self.id = MessageId(data["id"])
is enough.
It is not a subclass. At runtime it behaves like identity function.
An idea could be to use what PEP-702 propose, allowing for Union[*Id, int]
and marking the overload that accepts int
as deprecated
e.g:
from typing import overload
from warnings import deprecated
@overload
async def fetch_member(self, id: UserId) -> disnake.Member:
...
@overload
@deprecated("Only UserId will be supported in future versions")
async def fetch_member(self, id: int) -> disnake.Member:
...
async def fetch_member(self, id: Union[UserId, int], /) -> disnake.Member:
...
this will return a deprecation warning message by the type checker if the user invoke the method with an int
id parameter. The deprecated
decorator will be made available in the typing_extensions
too for backward compatibility, still i know that this is a big change and as such needs some more bikeshedding, for this reason if we will ever decide to implement this, it should be done in the 3.0 version. The depreacation warning should remain for some versions to give the users the time to migrate.
@Snipy7374 This is an interesting idea, thanks! Sadly it still has the same problem about *Id
s being automatically cast to int
, so if we were to go this path we would need to phrase the deprecation message in a way that accounts for both regular int literal usage and autocasted newtypes of other ids. The amount of overloads required is also daunting, unless it can be codemodded (which would pollute the codebase even more, unless we figure out how to separate "source source" code from "after-codemod source" code, which i imagine will be especially hard around IDE support)
As far as the overloads go, I think we could make a decorator for this. Regardless, from what I gather, we don't actually want to depreciate passing in a raw int? And if so, it'd be a stretch to use depreciated here.
I've written some example implementation of how the forementioned decorator could look like.
from collections import abc
from typing import Concatenate, Never, NewType, ParamSpec, Protocol, overload
from typing_extensions import TypeVar, deprecated
ObjectID = NewType("ObjectID", int)
IdT = TypeVar("IdT", bound=ObjectID, infer_variance=True)
P = ParamSpec("P")
RetT = TypeVar("RetT", infer_variance=True)
class AcceptsID(Protocol[IdT, P, RetT]):
@overload
def __call__(self, id: IdT, /, *args: P.args, **kwargs: P.kwargs) -> RetT: ...
@overload
def __call__(self, id: ObjectID, /, *args: P.args, **kwargs: P.kwargs) -> Never: ...
@overload
@deprecated("Using raw ints is not value-safe", category=None)
def __call__(self, id: int, /, *args: P.args, **kwargs: P.kwargs) -> RetT: ...
def overload_id(func: abc.Callable[Concatenate[IdT | int, P], RetT], /) -> AcceptsID[IdT, P, RetT]:
return func # pyright: ignore[reportReturnType]
I've added the intermediary ObjectID
to catch invalid ID types in the overload resolution. However, this makes wrapping raw ints even more annoying.
We could circumvent that by replacing ObjectID
with a Union
of all ID types, though it might lead to some circular imports/references.
AppID = NewType("AppID", ObjectID)
ChannelID = NewType("ChannelID", ObjectID)
@overload_id
def get_channel(id: ChannelID | int, /) -> object: ...
Passing in an int shows the depreciation message. Passing in AppID
resolves the return type to Never
, which marks any following code unreachable.
To get an instance of ChannelID
, one must do id = ChannelID(ObjectID(123))
.
Summary
Change all ID types from
int
to*Id
newtypes for greater type safety.What is the feature request for?
The core library
The Problem
Currently, this (~pseudo) code passes type checks, but in reality this will almost certainly raise an exception:
While the above code is obviously wrong, command bodies can get large enough in practice that the error will not be as easy to spot. Using IDs in incorrect places usually results in an unhandled exception, but in very extreme cases can lead to damage (f.e. as thread ids are the same as the original message ids, one can delete thread instead of the original message or vice versa). The more developer passes IDs around, the more is the chance of making mistakes like this.
The Ideal Solution
Change all raw
int
IDs to appropriate newtypes. This incurs zero (or close to zero) overhead, but allows to prevent incorrect usage of IDs.The implementation is very straightforward:
The rest is only about updating the library to use
Id
s instead of rawint
s, and maybe updating converters.The Current Solution
Carefully reviewing, testing or otherwise ensuring that IDs for e.g. channels are not used where e.g. a role Id was expected.
Additional Context
I am willing to work on this.
This was originally inspired by twilight-model's
Id
.Compatibility question is not resolved yet.