Add Universal.CodeAnalysis.MixedBooleanOperator

TimWolla commented 1 year ago

As previously discussed on Fediverse. I've copied over the logic and tests as-is, just making the necessary adjustments to fit the code style of PHPCSExtra, to make it easy to include the Sniff in CodeSniffer itself in the future.

This is a clone of the proposed sniff in squizlabs/PHP_CodeSniffer#3205. Adding it to PHPCSExtra to make it generally available quicker than by including it in core CodeSniffer.

TimWolla commented 1 year ago

Handled the obvious stuff already. Will look into the stuff that needs some additional thinking later. Thank you for the review so far!

TimWolla commented 1 year ago

Naming of the sniff

Question: should the sniff name be MixedBooleanOperator or MixedBooleanOperators (note the plural) ? Personally, I'm leaning towards the plural form as if there is only one operator, the issue doesn't exist and there will never be a mix.

This makes sense. Should this also include the “Binary” part within the name? i.e. MixedBinaryBooleanOperators?

Premise of the sniff

As things are, the sniff only concerns itself with the boolean and/or operators. Questions:
* Should it be documented (in the class docblock) why the boolean not `!` operator is not included in this sniff ?

With including the Binary in the name this should be clear. But I can add a short blurb (or you add some suggestion using GitHub's feature.

* Should the sniff also apply to the logical `and`/`or`/`xor` operators ?
  If not, maybe we should open an issue to create a sister-sniff for the logical operators ?

Included now.

Review notes:

Auto-fixing could be possible for this sniff, but would involve an assumption about the correctness of the code as written, so I'm fine with this sniff not having a fixer and letting a human decide where the parentheses should be. ✔️

Yes, this Sniff is intentionally written for detecting possibly incorrect code and we found several instances in our code base with the initial version I proposed. That's why autofixing must not happen.

I'm also trying to think if there is a situation in which the sniff should stop scanning or behave differently when an operator with a higher precedence is encountered ? Note: the $phpcsFile->findStartOfStatement($stackPtr) call does not take this into account, so the sniff would have to.

The Sniff tripping over such a situation would likely indicate that the human would also be confused. I am fine with some false-positives should there be such a situation, you can't have enough parentheses to make precedence clear. And if you mix and and && or others, you are really asking for trouble if you don't use parentheses.

Other

The build failures related to the code coverage checks are a known issue for PRs from forks and I'm working with the Coveralls team on a solution. You don't need to worry about this (though checking code coverage locally for the sniff would be good and will show that the sniff as-is is not completely covered by tests).

I wasn't able to get coverage running locally, due to the ancient PHPUnit requirement that is not trivially compatible with PHP 8.2. But with the updated code there isn't really much logic left in the sniff itself. If you would check that it's up to 100% coverage now, I'd appreciate it.

TimWolla commented 1 year ago

The commit history is a little messy due to the number of changes required due to my lack of knowledge of CodeSniffer. I can squash it for you once you are happy with the diff.

jrfnl commented 1 year ago

The commit history is a little messy due to the number of changes required due to my lack of knowledge of CodeSniffer. I can squash it for you once you are happy with the diff.

Yes please. Would be great if you could do that once the PR is ready for merge.

TimWolla commented 1 year ago

Some preliminary answers, will look into the code stuff later.

This makes sense. Should this also include the "Binary" part within the name? i.e. MixedBinaryBooleanOperators?

I'm not sure where the "Binary" terminology comes from ? The PHP manual calls the complete set of these operators "Logical operators", so I think using that term would prevent confusion.

It's “Binary Operator” as in “Operator that takes two operands”. ! would be a unary operator, as it only operates on one operand. And ? : is a ternary operator, because it operates on three operands. That's why it's called that.

see: https://en.wikipedia.org/wiki/Binary_operation

Regarding the name of the sniff, what with the functionality of the sniff having changed, I'm not sure about the name.

The sniff has changed from its original state to now be more comprehensive with the addition of checking for the logical and, or and xor operators.

It is still about mixing operators within a single expression, but more operators are now considered.

So let's turn it around: what is the aim of the sniff ? If I would paraphrase, it is about being explicit about the precedence order of (part of) statements by using parentheses. Do you agree ? So maybe the name of the sniff should reflect that instead ?

Yes, that description is accurate. Specifically explicit precedence for binary boolean operators, because for + vs * the precedence rules are fairly well-known, where-as for && and || it's less obvious. ! is not included because the way it's commonly used indicates that it binds tightly (less tight than instanceof, though. !$foo instanceof Bar works correctly in PHP, not in JavaScript though :grin:)

Maybe something along the lines of RequireExplicitLogicalOperatorPrecedence ? (Note: I'm definitely not saying that this is the name to go with, I'm just throwing it out here to trigger your imagination)

Naming, one of the hard things in computer science. I don't particularly care about the name and your suggestion would be fine with me (possibly including the “binary” part). You probably know the naming of existing Sniffs better than me.

With including the Binary in the name this should be clear. But I can add a short blurb (or you add some suggestion using GitHub's feature.

A short phrase in the class docblock would be good, as the ! operator is also considered a logical operator.

That's where the “Binary” would come in, but I'll add something.

I've only found one false positive - see the below code.

return match (true) {
    \defined($value) => \constant($value),
    'yes' === $lowercaseValue,
    'on' === $lowercaseValue => true,
    'no' === $lowercaseValue,
    'off' === $lowercaseValue,
    'none' === $lowercaseValue => false,
    isset($value[1]) && (
        ("'" === $value[0] && "'" === $value[\strlen($value) - 1]) || // <- Here the `||` gets flagged.
        ('"' === $value[0] && '"' === $value[\strlen($value) - 1]) || // <- Here the `&&` gets flagged.
    ) => substr($value, 1, -1), // quoted string
    default => doSomething($value),
};

This code is not syntactically valid, as the || is directly followed by ) => substr($value, 1, -1), // quoted string. I did not yet test the sniff with syntactically invalid code. Did you accidentally remove some bits while copying the code into the issue?

... and while this could also be considered a false positive, I'm inclined to say it should be flagged anyway.

if (
    [] !== $fields &&
    $constraint &&
    $constraint->payload &&
    // If some or all payload fields are whitelisted, add them
    $payloadFields = null === $fields || true === $fields ? $constraint->payload : array_intersect_key($constraint->payload, $fields)
) {}

This is exactly the type of code I'm concerned about, so I wouldn't even call it false positive. Looking at this, it's not immediately clear to me whether it is parsed as ($payloadFields = null === $fields) || (true === $fields ? $constraint->payload : array_intersect_key($constraint->payload, $fields)) or as $payloadFields = ((null === $fields || true === $fields) ? $constraint->payload : array_intersect_key($constraint->payload, $fields)) or as ($payloadFields = (null === $fields || true === $fields)) ? $constraint->payload : array_intersect_key($constraint->payload, $fields). Yes, three possible interpretations that would be syntactically valid.

Just thinking, if we can come up with a relevant code sample, it might also be good to add a test with an arrow function ?

Arrow functions are not 100% supported in the tokenizer layer (best effort, but the fact they can share their "scope closer" with other structures make things very difficult), so I think it may be a good idea to have some tests involving arrow functions just to be on the safe side.

Okay, will have a look.

jrfnl commented 1 year ago

This code is not syntactically valid, as the || is directly followed by ) => substr($value, 1, -1), // quoted string. I did not yet test the sniff with syntactically invalid code. Did you accidentally remove some bits while copying the code into the issue?

Sorry, my bad. This is the correct code sample for the false positive (I copied a bit too much when I added the comments):

        return match (true) {
            \defined($value) => \constant($value),
            'yes' === $lowercaseValue,
            'on' === $lowercaseValue => true,
            'no' === $lowercaseValue,
            'off' === $lowercaseValue,
            'none' === $lowercaseValue => false,
            isset($value[1]) && (
                ("'" === $value[0] && "'" === $value[\strlen($value) - 1]) ||
                ('"' === $value[0] && '"' === $value[\strlen($value) - 1])
            ) => substr($value, 1, -1), // quoted string
            default => XmlUtils::phpize($value),
        };

jrfnl commented 1 year ago

This is exactly the type of code I'm concerned about, so I wouldn't even call it false positive. Looking at this, it's not immediately clear to me whether it is parsed as ($payloadFields = null === $fields) || (true === $fields ? $constraint->payload : array_intersect_key($constraint->payload, $fields)) or as $payloadFields = ((null === $fields || true === $fields) ? $constraint->payload : array_intersect_key($constraint->payload, $fields)) or as ($payloadFields = (null === $fields || true === $fields)) ? $constraint->payload : array_intersect_key($constraint->payload, $fields). Yes, three possible interpretations that would be syntactically valid.

Except there isn't really room for interpretation due to the $payloadFields = and = having higher precedence than any of the operators this sniff is looking for.

Still I agree that this kind of code can be confusing.

TimWolla commented 1 year ago

You already agreed, so all is fine, but I wanted to note this nonetheless:

Except there isn't really room for interpretation due to the $payloadFields = and = having higher precedence than any of the operators this sniff is looking for.

I actually needed to print the AST using https://github.com/nikic/PHP-Parser to be sure how it was interpreted.

In fact the fact that

$a && $d = null === $b || true === $c ? true : false;

is

$a && ($d = ((null === $b || true === $c) ? true : false));

but

$a && null === $b || true === $c ? true : false;

is

(($a && null === $b) || true === $c) ? true : false;

is surprising to me.

jrfnl commented 1 year ago

You already agreed, so all is fine, but I wanted to note this nonetheless:

Except there isn't really room for interpretation due to the $payloadFields = and = having higher precedence than any of the operators this sniff is looking for.

I actually needed to print the AST using https://github.com/nikic/PHP-Parser to be sure how it was interpreted.

In fact the fact that
$a && $d = null === $b || true === $c ? true : false;
is
$a && ($d = ((null === $b || true === $c) ? true : false));
but
$a && null === $b || true === $c ? true : false;
is
(($a && null === $b) || true === $c) ? true : false;
is surprising to me.

Guess, you never had the chance to see my "The big "Why equal doesn't equal" Quiz" ? 😂

But yes, operator precedence can be a bitch and I see way too many issues created because people don't understand/know it well enough (which is all the more reason why I'm happy to accept this sniff in this package).

You already saw PR #277 and yes, that was fun to write what with the operator precedence issues which could be caused when auto-fixing ;-)

TimWolla commented 1 year ago

Sorry, my bad. This is the correct code sample for the false positive (I copied a bit too much when I added the comments):

Simplified:

<?php

match (true) {
    $a || ($b && $c) => true,
};

This is somehow caused by the match() behaving differently than e.g. an array:

<?php

[
    $a || ($b && $c) => true,
];

Specifically findStartOfStatement returns the $a for both operators in the match() and $a for the || and $b for the && in the array.

While I don't want to blame the upstream library, I must say that this is very surprising behavior for constructs that are so similar. Is this a bug in CodeSniffer itself or is this intentional? If this is intentional, do you have some specific advice / code snippet on how to fix this in this Sniff, ideally without replicating the entire logic?

TimWolla commented 1 year ago

I've given the sniff another look over and left some comments, but it looks like a number of my remarks from my previous comment

Yes, I plan to do the renaming and related changes once the functionality is clear and correct to avoid unnecessary churn. I was waiting for advice regarding the match statement, before proceeding. All the stuff that is not yet collapsed still is on my list.

TimWolla commented 1 year ago

Okay, I managed to fix the handling for the match case and added arrow function tests.

I believe this should now be everything with regard to the implementation. I'll think about the naming adjustments once you confirm the implementation is correct and complete.

Do I still need “syntax error” tests or did that become irrelevant with the simplified implementation?

jrfnl commented 1 year ago

Okay, I managed to fix the handling for the match case and added arrow function tests.

@TimWolla Thank you for making those changes and updates. I've tried to come up with cases which could break the current implementation, but my imagination struck out, so I think we're good. Well done! 🎉

Might be worth adding the following extra tests still to safeguard things further, but these are already handled correctly:

// Not ok.
match (true) {
    $a || ($b && $c) && $d => true,
    $b && $c['a'] || $d => true,
    $b && ${$var} || $d => true,
};

I believe this should now be everything with regard to the implementation. I'll think about the naming adjustments once you confirm the implementation is correct and complete.

I'm happy with the implementation as it is now. Let's get the name sorted & get this merged!

Do I still need “syntax error” tests or did that become irrelevant with the simplified implementation?

With the current implementation, I don't see a case where a parse error could cause the sniff to throw a PHP notice, so as far as I'm concerned it's not needed for the sniff as-it-is now.

TimWolla commented 1 year ago

Okay, I:

Added the additional test case.
Explained why ! is not handled.
Adjusted the copyright line.
Added the requested references to other sniffs.
Renamed the Sniff to RequireExplicitBooleanOperatorPrecedence. I've opted for “Boolean” instead of “Logical”, because:
- The internal token array is called $booleanOperators.
- It's mainly about && and || (because no one should use and, or, xor in the first place) and the internal tokens are T_BOOLEAN_AND, T_BOOLEAN_OR vs T_LOGICAL_AND, T_LOGICAL_OR, T_LOGICAL_XOR for the others.
- PSR-12 uses “boolean operator”.
- It matches the name of PSR12.ControlStructures.BooleanOperatorPlacementSniff, which is reasonably related, because operator placement and precedence go hand in hand.

TimWolla commented 1 year ago

Will squash the entire commit history once you approve that everything is good.

jrfnl commented 11 months ago

FYI: I've moved this PR out of the 1.2.0 milestone as, what with the PHPCS repo move, the sniff can go into the main PHPCS repo as soon as its been pulled there (as it's been extensively reviewed here already), which is what the Op indicated originally as the preferred option. I've discussed this with @TimWolla off-GH.

PHPCSStandards / PHPCSExtra

Add Universal.CodeAnalysis.MixedBooleanOperator #271

Naming of the sniff

Premise of the sniff

Review notes:

Other