PD rules trigger on non-Pandas DataFrames

astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.

https://docs.astral.sh/ruff

MIT License

33.1k stars 1.11k forks source link

PD rules trigger on non-Pandas DataFrames #6432

Open beskep opened 1 year ago

beskep commented 1 year ago

command: ruff check test.py ruff version: ruff 0.0.282 settings: select = ['ALL']

example:

import polars as pl

pldf = pl.DataFrame()
pldf.pivot()  # PD010 `.pivot_table` is preferred to `.pivot` or `.unstack`; provides same functionality

polars DataFrame provides .pivot() function but no .pivot_table() unlike pandas.

charliermarsh commented 1 year ago

Difficult for us to fully resolve this without a full type inference engine (we could use heuristics, like avoid flagging these rules if polars is imported, but that comes with other problems: you don't have to import Polars in order to access a Polars DataFrame, and just because you import Polars doesn't mean you aren't working with Pandas DataFrames anywhere). Likely won't be fixed in the near-term.

(I'd recommend against using these rules if you're working with Polars.)

MarcoGorelli commented 1 year ago

for a simpler heuristic, would it be possible to check the alias used to instantiate the dataframe? pl.DataFrame rather than pd.DataFrame gives a pretty strong clue that it's not pandas

kleinicke commented 7 months ago

Currently the pandas rules are applied on many non pandas objects. For example PD011 tries to stop you from using .values anywhere, even if you use a library where you should use it. Therefore some kind of check, if the object is even belonging to pandas would be pretty useful.

bje- commented 4 months ago

The same thing happens with the Python DEAP package which has class members named values.

ItsDrike commented 4 months ago

Ruff is actually really trigger happy here, just posting another quick example that causes ruff to trigger while just messing around with python builtins:

# ruff: noqa: F841
# pyright: reportUnusedVariable=false

x = {}
values_dict_func = x.values  # PD011

bje- commented 4 months ago

Difficult for us to fully resolve this without a full type inference engine (we could use heuristics, like avoid flagging these rules if polars is imported, but that comes with other problems: you don't have to import Polars in order to access a Polars DataFrame, and just because you import Polars doesn't mean you aren't working with Pandas DataFrames anywhere). Likely won't be fixed in the near-term.

I think the false positive rate on this warning is so high it should be abandoned. Could Pandas be modified to emit a deprecation warning instead?

charliermarsh commented 4 months ago

Why not just turn it off in your project? By definition you've opted into it.

bje- commented 4 months ago

A good lint tool should be one that doesn't require littering your source files with pragmas to disable false positives. Isn't one of the purposes of a linter to improve code readability?

(I just used a noqa pragma to disable NPY002, but in this case, ruff is correct, but I can't change it).

ncooder commented 1 week ago

This problem persists with pyspark. It tries to replace pivot with pivot_table.