Open zanieb opened 1 year ago
I just spent a while tracking down this exact issue, an error introduced by ruff and reduced to the almost identical:
import numpy as np
arr = np.array([False, True, False, True])
print(repr(arr == False))
# array([ True, False, True, False])
print(repr(arr is False))
# False
Reading the other thread, it sounds like this autofix can't be made safe. I would suggest disabling it completely then, because using truthiness in numpy comparison isn't a rare operation, and expecting everyone to "know" not to do this seems to defeat the point of an autocorrecting linter.
Reading the other thread, it sounds like this autofix can't be made safe. I would suggest disabling it completely then, because using truthiness in numpy comparison isn't a rare operation, and expecting everyone to "know" not to do this seems to defeat the point of an autocorrecting linter.
I think making this a suggested fix (as it is now) will have the same effect, once we introduce --fix
and --fix-unsafe
(the former of which will only make automatic fixes, while the latter will include suggested fixes, which may include changes in behavior).
The problem with removing the autofix entirely is that it doesn't really reduce the burden or expectation on the user, because this diagnostic will still be raised, and so users will still be required to look at the code and understand whether or not to change it. Performing the fix automatically has the downside of silently breaking the code, but requiring users to opt-in to the change explicitly seems (to me) as safe as pointing them to the diagnostic without including any possible fix.
Any conclusion on this one? This issue broke a bunch of our pyspark
code by converting code similar to:
df = df.where(F.col("colName") == True)
to
df = df.where(F.col("colName") is True)
which leads to TypeError: condition should be string or Column
Thank you
Adding to @nicornk: PEP-8 actually strongly discourages using is
with boolean constants:
Don’t compare boolean values to True or False using ==:
# Correct:
if greeting:
# Wrong:
if greeting == True:
Worse:
# Wrong:
if greeting is True:
So while the autofix may be unsafe, I would argue the diagnostics itself is harmful. The correct suggestion would be to drop == True
entirely (though PySpark is certainly another special case, as == False
would need a conversion to the unary not ~
operator ...)
@nicornk you can use the unambiguous alternate syntax e.g. df = df.where(F.col("colName").is_(True))
or disable the rule. Additionally, once Ruff has type inference, we will avoid suggesting a change to col is True
.
@dstoeckel while I agree that if foo
is definitely better than if foo == True
, there are entirely valid uses for checking if something is True
or False
without having a __bool__
cast occur and there are cases outside of if
statments where the user must perform the cast. Unfortunately the diagnostic must try to guess the intent of the user when suggesting a fix which puts us in a tough spot. I'm not sure this issue is the correct place to discuss the validity of the rule as a whole though, I'd like to keep this scoped to a discussion of false positives based on type inference.
Additionally, once Ruff has type inference
Is there an existing ongoing effort for this that's public?
Note as of v0.1.0 we do not apply unsafe fixes by default — so this fix will not be applied by default.
there are entirely valid uses for checking if something is
True
orFalse
without having a__bool__
cast occur
Of course this is a matter of opinion, but if you want to check this, I think the Pythonic way to do so is to use:
if isinstance(x, bool) and x:
# or
match x:
case True:
is True
is far more often misused in my opinion.
Please let's not make this issue a debate about how if _ == True
, if _ is True
, and if _
should be used or whether E712
is valid in general.
The libraries that are the focus of this issue have designed APIs where specific comparisons are necessary. This issue is intended to track support for patterns in those APIs. I'd recommend creating a new discussion if you want to discuss broader concerns.
Just to add an example, of an error that took me a while to fix.
import pandas as pd
# Example dataframe
df = pd.DataFrame({"id": [1, 2, 3, 4, 5], "col_2": [True, False, True, False, True]})
# This works, but ruff raises: Comparison to `False` should be `cond is False`
a = df[df["col_2"] == False]
print(a)
# This does not work, pandas will raise 'KeyError: False'
b = df[df["col_2"] is False]
print(b)
Another example with sqlalchemy2 ( where syntax has changed a lot comparing with versions 1.x):
query = request.dbsession.execute(
select(Station, func.min(Check.result)) # pylint: disable=E1102
.join(Check.channel)
.join(Channel.station)
.where(
Station.triggered == False,
)
).all()
Here the Station.triggered == False
raises the E712
If replaced by Station.triggered is False
the result is not what is expected
Same thing with E711.
Would be nice to have a brief mention in the rule's docs calling out common situations where this is unsafe and why
Thanks @psychedelicious. Would you be willing to open a pull request?
But why do we change ==
/!=
to is
/is not
at all?? PEP8 says it is even worse.
oh, checking #8164 that seems to deal with the same question…
Now that Ruff is moving towards having type information, this issue may eventually warrant some refinement? If x
has Boolean type, then if x
is appropriate and if x is True
is inappropriate. If x
has a broader type, then either could be fine.
If x has a broader type, then either could be fine.
I would argue that if x is not Boolean type, "is True" is always wrong - for example if x is numpy.bool_
, comparing it with is
will be wrong.
I ran into this when writing this example for the next version of altair
- based on upstream example
Would be applicable to:
altair.vegalite.v5.api
I would argue that if x is not Boolean type, "is True" is always wrong - for example if x is numpy.bool_, comparing it with is will be wrong.
By broader, I mean something like Any
or bool | int
or np.bool | bool
, etc.
Note about SQLAlchemy:
The .is_
and .is_not
methods may not be safe alternatives to the equality (==
/!=
) operators, because IS
and =
are different SQL operators. Consider the following example in MySQL:
SELECT 10 IS TRUE;
/* 1 */
SELECT 10 = TRUE;
/* 0 */
SELECT NULL IS NOT FALSE;
/* 1 */
SELECT NULL != FALSE;
/* NULL */
Thus, the safe alternative for comparing to booleans is:
c == True
# <sqlalchemy.sql.elements.BinaryExpression object at 0x1026203e0>
c == sqlalchemy.true()
# <sqlalchemy.sql.elements.BinaryExpression object at 0x1026222a0>
Summary
Generally, comparisons to
True
,False
, andNone
singletons should useobj is True
instead ofobj == True
.However, it is common for libraries to override the
==
/__eq__
operator to create simple APIs for filtering data. In these cases, correcting==
tois
changes the meaning of the program and breaks the user's code. The same applies for!=
andis not
.This is a tracking issue for all invalid corrections from this rule.
Types with the issue
pandas.DataFrame
often used withDataFrame.mask
,DataFrame.where
pandas.Series
often used withSeries.mask
,Series.where
numpy.Array
often used withArray.where
sqlalchemy.Column
often used withQuery.having
,Query.filter
,Query.where
If an issue with an unlisted type is encountered please reply and I will edit to add it here.
Resolution
Eventually, ruff is likely to detect these cases by inferring the datatype involved and exclude it from the suggested fix.
In the meantime, you may:
pandas.Series.eq
)Examples
Related issues