PHPCheckstyle / phpcheckstyle

PHPCheckstyle is an open-source tool that helps PHP programmers adhere to certain coding conventions.
https://github.com/PHPCheckstyle/phpcheckstyle
GNU Lesser General Public License v3.0
164 stars 31 forks source link

Warning about "unused code after RETURN." #40

Open rr- opened 10 years ago

rr- commented 10 years ago

Consider this example:

<?php
class Test
{
    public function test($params)
    {
        return $this->commit(function() use ($params)
        {
            $whatever = 5;
            return $whatever;
        });
    }

    private function commit($callback)
    {
        //complex logic using $callback can go here
        $callback();
    }
}

PHPCheckstyle warns me about it:

File "/home/rr-/test/test.php" warning, line 6 - Function test has unused code after RETURN.

even though it isn't true. To make the warning go away, I need to refactor the code like this:

<?php
class Test
{
    public function test($params)
    {
        $callback = function() use ($params)
        {
            $whatever = 5;
            return $whatever;
        };
        return $this->commit($callback);
    }

    private function commit($callback)
    {
        //complex logic using $callback can go here
        $callback();
    }
}
tchule commented 10 years ago

Yes, I've already had some false positives with this rule. The analysis done is too simple.

This is where a real AST could help a lot.

jbrooksuk commented 10 years ago

I'd say that #32 is related to this.

rr- commented 10 years ago

https://github.com/nikic/PHP-Parser

I found decent AST, but it ignores whitespace. Apparently, there is an issue for this but it hasn't been worked on for two years.

jbrooksuk commented 10 years ago

@rr- yeah, @nikic's PHP-Parser is something I've looked at. I've not had any time to actually look at implementing it - it'll definitely fix a lot of our problems.

jbrooksuk commented 10 years ago

@rr- I'm going to try and spend a bit of time looking at PHP-Parser if I can. The introduction says:

There are other ways of processing source code. One that PHP supports natively is using the token stream generated by token_get_all. The token stream is much more low level than the AST and thus has different applications: It allows to also analyze the exact formatting of a file. On the other hand the token stream is much harder to deal with for more complex analysis. For example an AST abstracts away the fact that in PHP variables can be written as $foo, but also as $$bar, ${'foobar'} or even ${!${''}=barfoo()}. You don't have to worry about recognizing all the different syntaxes from a stream of tokens.

The bit I'm interested is in bold.

So token_get_all (which is what we use) is great but so is using an AST. I've got a feeling we're going to have to mix and match here.

jbrooksuk commented 10 years ago

Another way to go may be to look at how Scrutinzer works.

rr- commented 10 years ago

Yes, using tokenizer would be cool to do checks like whitespace around operators, etc. AST should be used for more complex checks, like unused variables and code. Trying to figure them out from flat token list would be too complicated.

In each use case, I'd go with whatever makes the code most maintainable.

Note that token_get_all ignores a lot of tokens, AFAIR including whitespace. The best option here would be to write a tokenizer yourself. An example tokenizer could look like this. It is based on regular expressions and longest match rule. As you see, it's not that complex. I tested it only on itself, so it is by no means complete.

If implemented carefully, it probably would decrease size of existing codebase twofold without breaking current functionality, which is a good thing.

jbrooksuk commented 10 years ago

PHPCheckstyle does add some missing tokens in to account for those which aren't included.

I've started a new branch which you're more than welcome to contribute to.

nikic commented 10 years ago

As an additional consideration, it should be added that the php-parser is much slower than token_get_all. I don't know how much slower off the top of my head, but at least 10 times. So including it is probably only worth the slowdown if you're going to do multiple complex inspections.

It's possible to switch from the syntax tree to the underlying tokens with a bit of extra code. By using a custom token offset lexer you can get the start and end offsets for a particular AST node in the token array. I'm using this approach to inspect the exact formatting of a node or do changes to code without breaking formatting. Sadly I don't have open-source examples for this right now.

tchule commented 10 years ago

I confirm that currently PHPCheckstyle use token_get_all() with some modifications to take into account spaces, tabs and carriage return and to count the line returns in the inlined HTML.

I've been very interested in PHP-Parser for a long time (thanks nikic, it's a nice job you've done) but no time to try to implement something.

We should also followx what will happend with PHP 7 and its AST.

jbrooksuk commented 10 years ago

https://github.com/nikic/php-ast