Open beskep opened 1 year ago
Difficult for us to fully resolve this without a full type inference engine (we could use heuristics, like avoid flagging these rules if polars
is imported, but that comes with other problems: you don't have to import Polars in order to access a Polars DataFrame, and just because you import Polars doesn't mean you aren't working with Pandas DataFrames anywhere). Likely won't be fixed in the near-term.
(I'd recommend against using these rules if you're working with Polars.)
for a simpler heuristic, would it be possible to check the alias used to instantiate the dataframe? pl.DataFrame
rather than pd.DataFrame
gives a pretty strong clue that it's not pandas
Currently the pandas rules are applied on many non pandas objects. For example PD011 tries to stop you from using .values anywhere, even if you use a library where you should use it. Therefore some kind of check, if the object is even belonging to pandas would be pretty useful.
The same thing happens with the Python DEAP
package which has class members named values
.
Ruff is actually really trigger happy here, just posting another quick example that causes ruff to trigger while just messing around with python builtins:
# ruff: noqa: F841
# pyright: reportUnusedVariable=false
x = {}
values_dict_func = x.values # PD011
Difficult for us to fully resolve this without a full type inference engine (we could use heuristics, like avoid flagging these rules if
polars
is imported, but that comes with other problems: you don't have to import Polars in order to access a Polars DataFrame, and just because you import Polars doesn't mean you aren't working with Pandas DataFrames anywhere). Likely won't be fixed in the near-term.
I think the false positive rate on this warning is so high it should be abandoned. Could Pandas be modified to emit a deprecation warning instead?
Why not just turn it off in your project? By definition you've opted into it.
A good lint tool should be one that doesn't require littering your source files with pragmas to disable false positives. Isn't one of the purposes of a linter to improve code readability?
(I just used a noqa pragma to disable NPY002, but in this case, ruff is correct, but I can't change it).
This problem persists with pyspark. It tries to replace pivot with pivot_table.
command:
ruff check test.py
ruff version:ruff 0.0.282
settings:select = ['ALL']
example:
polars DataFrame provides
.pivot()
function but no.pivot_table()
unlike pandas.