DisnakeDev / disnake

An API wrapper for Discord written in Python.
https://docs.disnake.dev
MIT License
719 stars 137 forks source link

Typed IDs. #1149

Open elenakrittik opened 8 months ago

elenakrittik commented 8 months ago

Summary

Change all ID types from int to *Id newtypes for greater type safety.

What is the feature request for?

The core library

The Problem

Currently, this (~pseudo) code passes type checks, but in reality this will almost certainly raise an exception:

id = inter.channel.id
inter.guild.fetch_member(id)

While the above code is obviously wrong, command bodies can get large enough in practice that the error will not be as easy to spot. Using IDs in incorrect places usually results in an unhandled exception, but in very extreme cases can lead to damage (f.e. as thread ids are the same as the original message ids, one can delete thread instead of the original message or vice versa). The more developer passes IDs around, the more is the chance of making mistakes like this.

The Ideal Solution

Change all raw int IDs to appropriate newtypes. This incurs zero (or close to zero) overhead, but allows to prevent incorrect usage of IDs.

The implementation is very straightforward:


ApplicationId = NewType("ApplicationId", int)
ChannelId = NewType("ChannelId", int)
# and so on for every resource type

The rest is only about updating the library to use Ids instead of raw ints, and maybe updating converters.

The Current Solution

Carefully reviewing, testing or otherwise ensuring that IDs for e.g. channels are not used where e.g. a role Id was expected.

Additional Context

I am willing to work on this.

This was originally inspired by twilight-model's Id.

Compatibility question is not resolved yet.

shiftinv commented 7 months ago

This is definitely an interesting proposal, thanks!

The cost of using IDs in incorrect places can range from an unhandled exception up to damaging users' servers and, in very rare, but still real cases, introducing security holes (if the ID comes from user input).

I can't think of situations where a mistake like that could cause any damage, given that IDs are unique (barring a few exceptions), but it's rather annoying and difficult to debug nonetheless, yeah.

Change all raw int IDs to appropriate newtypes. This incurs zero (or close to zero) overhead, but allows to prevent incorrect usage of IDs.

Sounds reasonable. I'm somewhat worried about the additional overhead in very common expressions/hot paths like self.id = MessageId(int(data["id"])). It's less in >=3.11 (https://github.com/python/cpython/commit/96c4cbd96c769e92869c62ba898dd9eb670baa81), but not fully negligible especially in earlier versions.

Compatibility is still to be estimated.

Compatibility is my main concern here, even if it's just in the type-checking realm, and not at runtime. As far as I understand, only simple expressions like fetch_channel(channel.id) would continue to work as-is, anything more advanced would likely require replacing annotations with the appropriate *Id types everywhere; while reasonable in many cases, this can be a pretty big ask in larger projects.

The ideal compatible solution would be making methods accept something like ChannelId | int instead of just ChannelId. With the current typing.NewType semantics however, that would still allow passing mismatching ID types.

I don't have any data on how often mix-ups between different ID "types" happen - I imagine it's probably not that frequent to warrant a fundamental change like this, personally. If this library was still in very early development, adopting typed IDs like this would be really cool. By now, preserving compatibility (when possible) is pretty essential, and there doesn't seem to be an obvious way to accomplish that here (wouldn't mind to be proven wrong about this, though).

elenakrittik commented 7 months ago

I'm somewhat worried about the additional overhead in very common expressions/hot paths like self.id = MessageId(int(data["id"])). It's less in >=3.11 (python/cpython@96c4cbd), but not fully negligible especially in earlier versions.

I think we can make an exception for those and # type: ignore them, given that these IDs are coming from Discord and are likely unlikely to be invalid/wrong.

Enegg commented 7 months ago

Wouldn't the identity function overhead be dwarfed by the int cast anyway?

OseSem commented 6 months ago

Change all ID types from int to *Id newtypes for greater type safety.

This is cool and all, but if they were to do this it would bring some really weird code. For example: everytime you want to search someone up you would have to ApplicationID(applicationid) which could really flood the code. And if the ID doesn't exist it would already flag.

OseSem commented 6 months ago

And you can make an error_handler using disnake.ext.commands.UserNotFound for example.

elenakrittik commented 6 months ago

This proposal's intention is to allow optional, type-check-time guarantees about validness of ID usage. Whether to use it or not is up to you, and if you prefer catching errors at runtime instead, you will be able to do that.

(if we find a solution for the compatibility problem..)

OseSem commented 6 months ago

There could be something like bot.get_member() which could be member_id: int | UserID

Enegg commented 6 months ago

There could be something like bot.get_member() which could be member_id: int | UserID

The union would defeat the purpose of this issue, since NewType(..., int) is a subtype of int.

tooruu commented 1 month ago

I'm somewhat worried about the additional overhead in very common expressions/hot paths like self.id = MessageId(int(data["id"])).

int call is redundant here, since MessageId is its subclass. self.id = MessageId(data["id"]) is enough.

Enegg commented 1 month ago

int call is redundant here, since MessageId is its subclass. self.id = MessageId(data["id"]) is enough.

It is not a subclass. At runtime it behaves like identity function.

Snipy7374 commented 3 weeks ago

An idea could be to use what PEP-702 propose, allowing for Union[*Id, int] and marking the overload that accepts int as deprecated e.g:

from typing import overload
from warnings import deprecated

@overload
async def fetch_member(self, id: UserId) -> disnake.Member:
    ...

@overload
@deprecated("Only UserId will be supported in future versions")
async def fetch_member(self, id: int) -> disnake.Member:
    ...

async def fetch_member(self, id: Union[UserId, int], /) -> disnake.Member:
    ...

this will return a deprecation warning message by the type checker if the user invoke the method with an int id parameter. The deprecated decorator will be made available in the typing_extensions too for backward compatibility, still i know that this is a big change and as such needs some more bikeshedding, for this reason if we will ever decide to implement this, it should be done in the 3.0 version. The depreacation warning should remain for some versions to give the users the time to migrate.

elenakrittik commented 3 weeks ago

@Snipy7374 This is an interesting idea, thanks! Sadly it still has the same problem about *Ids being automatically cast to int, so if we were to go this path we would need to phrase the deprecation message in a way that accounts for both regular int literal usage and autocasted newtypes of other ids. The amount of overloads required is also daunting, unless it can be codemodded (which would pollute the codebase even more, unless we figure out how to separate "source source" code from "after-codemod source" code, which i imagine will be especially hard around IDE support)

Enegg commented 3 weeks ago

As far as the overloads go, I think we could make a decorator for this. Regardless, from what I gather, we don't actually want to depreciate passing in a raw int? And if so, it'd be a stretch to use depreciated here.

Enegg commented 2 weeks ago

I've written some example implementation of how the forementioned decorator could look like.

from collections import abc
from typing import Concatenate, Never, NewType, ParamSpec, Protocol, overload
from typing_extensions import TypeVar, deprecated

ObjectID = NewType("ObjectID", int)
IdT = TypeVar("IdT", bound=ObjectID, infer_variance=True)
P = ParamSpec("P")
RetT = TypeVar("RetT", infer_variance=True)

class AcceptsID(Protocol[IdT, P, RetT]):
    @overload
    def __call__(self, id: IdT, /, *args: P.args, **kwargs: P.kwargs) -> RetT: ...

    @overload
    def __call__(self, id: ObjectID, /, *args: P.args, **kwargs: P.kwargs) -> Never: ...

    @overload
    @deprecated("Using raw ints is not value-safe", category=None)
    def __call__(self, id: int, /, *args: P.args, **kwargs: P.kwargs) -> RetT: ...

def overload_id(func: abc.Callable[Concatenate[IdT | int, P], RetT], /) -> AcceptsID[IdT, P, RetT]:
    return func  # pyright: ignore[reportReturnType]

I've added the intermediary ObjectID to catch invalid ID types in the overload resolution. However, this makes wrapping raw ints even more annoying. We could circumvent that by replacing ObjectID with a Union of all ID types, though it might lead to some circular imports/references.

AppID = NewType("AppID", ObjectID)
ChannelID = NewType("ChannelID", ObjectID)

@overload_id
def get_channel(id: ChannelID | int, /) -> object: ...

Passing in an int shows the depreciation message. Passing in AppID resolves the return type to Never, which marks any following code unreachable. To get an instance of ChannelID, one must do id = ChannelID(ObjectID(123)).