DetachHead / basedpyright

pyright fork with various type checking improvements, improved vscode support and pylance features built into the language server
https://docs.basedpyright.com
Other
1.2k stars 22 forks source link

Apply different typing rules to particular (external) packages #701

Open jamestrousdale opened 1 month ago

jamestrousdale commented 1 month ago

First, thanks for the work on basedpyright. We are adopting it in our organization.

This is more of a question than a request - is it going to be possible to allow basedpyright to apply a different ruleset to particular 3rd party packages?

We would like to lock down to very strict typing, but we also use a lot of, for example, pandas. So this creates a conflict the section pandas enters the picture. Right now we are going to attempt to just apply pandas rulesets but I'm sure we'll hit some edge cases in other 3rd party packages where we won't be able to make it work - and if usage of that package is extensive, we'd like to not litter our code with too many # pyright: ignore[***] directives.

My guess is that this is something specific to the way pyright parses the types and so this may be out of scope of this project, but I wanted to check, and I didn't see any related issues in my search.

DetachHead commented 1 month ago

execution environments allow you to use different diagnostic settings on different parts of your project. however i'm not quite sure what you mean by applying different rules to third party packages.

are you type checking 3rd party code or your own stubs for third party code? or are you saying that you want for example the reportAny rule to not be reported on anything that comes from the pandas module?

or is the problem just that you want the strict type checking but don't want to have to fix thousands of errors in old code? if so, you may be interested in the baseline feature that i just released

jamestrousdale commented 1 month ago

Sorry for not being clear in the original post.

execution environments allow you to use different diagnostic settings on different parts of your project. however i'm not quite sure what you mean by applying different rules to third party packages.

Execution environments are a great feature that we'll definitely make use of (for example, to relax typing on our test code where I've seen, for example, some aspects of MagicMocks run up against Any-related rules, but not what I mean here.

or are you saying that you want for example the reportAny rule to not be reported on anything that comes from the pandas module?

Yeah, that's kind of what I'm getting at. I'd like to be able to apply strict rules to my own code (and compatible 3rd party libraries), but apply a different set of rules to pandas, for example.

In most cases, this isn't necessary as if the inline typing or stubs for a 3rd party package are insufficient, we can make our own. However, pandas is different because of the complexity of the stubbing (complex enough that pandas team has come out and said they'll never support fully strict pyright typing. So I'd like to be able to be able to, I guess, ignore certain rules for symbols that originate from a particular module.

DetachHead commented 1 month ago

interesting idea. i'm not famiar with pandas though so i'm curious to see some examples of why it can't be fully typed

KotlinIsland commented 1 month ago

i've seen similar scenarios, some things are Any, but are either untypable, or have no bearing on the semantics of the program

the ability to ignore certain Any would be good I think

DetachHead commented 1 month ago

i guess reportUnknown* vs reportAny covers that

jamestrousdale commented 1 month ago

interesting idea. i'm not famiar with pandas though so i'm curious to see some examples of why it can't be fully typed

I don't completely understand the ins and outs of it but I think it's just that the way data types can be changed/defined in series/dataframes would be difficult to type out explicitly. For example, if I have a dataframe with a str, int and datetime-valued columns, I'd need to incorporate that into the typing in a very verbose way so that when I dereference a column, I get back the right Series[?] generic subtype. This is probably just a surface level explanation of the problems pandas would have with pyright strict typing.

pandas skirts around this by just ignoring the type parametrization of the generics - so that e.g. every Series ends up as a Series[unknown] (effectively a Series[Any]). In their own typing they use basic typing but then enable a bunch of extra rules to go partway to strict, and without some package/module-specific rule sets, any project importing pandas is bound to this ruleset as an upper bound (maybe not quite as you may not intersect with the parts of their interface that break certain rules, but that's hard to distinguish and you'll definitely hit some of them no matter what).

Now imagine you have multiple projects with their own subset of strict compliance - you're currently effectively bound to either an intersection of their compliance, or to use a lot of #pyright: ignore[...] directives.