PHPCSStandards / PHP_CodeSniffer

PHP_CodeSniffer tokenizes PHP files and detects violations of a defined set of coding standards.
BSD 3-Clause "New" or "Revised" License
805 stars 47 forks source link

4.0 | Proposal: improvements to DocBlock tokenization #484

Open jrfnl opened 2 months ago

jrfnl commented 2 months ago

Current Situation

As things are, tokens within DocBlocks are tokenized as T_DOC_COMMENT_* tokens.

Additionally, there are two tokens in a docblock which receive additional information:

The problem

Sometimes a sniff may want to listen for a specific Docblock token, like T_DOC_COMMENT_TAG, and depending on the findings may then want to jump to the DocBlock opener/closer to check some additional info.

This is currently not possible.

The work-around is to always listen for a T_DOC_COMMENT_OPEN_TAG token, loop through the comment_tags to find the target(s), and jump to the tag from there. The comment_closer info is then available to the sniff on the T_DOC_COMMENT_OPEN_TAG token.

Alternatively, a sniff can walk to the start/end of the DocBlock from any token within, but depending on how extensive the available documentation is (including possible annotations for Doctrine, Psalm, PHPUnit), walking the tokens to the start/end of the DocBlock can easily mean having to walk hundreds of tokens.

Proposal

I'd like to propose adding the comment_opener and comment_closer indexes to all tokens within a DocBlock, so the information about the opener/closer is always available.

For those sniffs affected by the above outlined problem, this should allow for improving the performance.

Alternatives considered

Other token types with an *_opener/*_closer often have a token array index on the tokens between the "opener" and "closer" in array format with information about all applicable (potentially nested) openers/closers (or "owners") (also see the details in the fold-out below), however DocBlocks can not be nested, so putting the information about the opener/closer in an array does not make sense to me for DocBlocks.

Other considerations

For all other opener/closer type of indexes, the *_opener/*_closer indexes are consistently both added to both the opener as well as the closer.

At this time, the way the indexes are set for DocBlocks is not consistent with that as the indexes for DocBlocks are set asymmetrically. This proposal fixes that too.

Details on all opener/closer types and what info is available in the token array | Bracket type | Opener + closer has these indexes | And may have | Tokens between have | Notes | |-----------------------------------------------|--------------------------------------------------------|---------------------|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------| | Parentheses | `parenthesis_opener` + `parenthesis_closer` | `parenthesis_owner` | `nested_parenthesis` (`array int closer>`) | | | Curly braces (all) | `bracket_opener` + `bracket_closer` | -- | -- | See #12 for a proposal to add a `nested_brackets` key | | Square brackets (all, including short arrays) | `bracket_opener` + `bracket_closer` | -- | -- | See #12 for a proposal to add a `nested_brackets` key | | Curly braces for control structures | `scope_opener` + `scope_closer` | `scope_condition` | `conditions` (`array int\|string owner code>`) | | | Alternative syntax for control structures | `scope_opener` + `scope_closer` | `scope_condition` | `conditions` (`array`) | | | Heredoc/Nowdoc | `scope_opener` + `scope_closer` | `scope_condition` | `conditions` (`array`) | | | Attributes | `attribute_opener` + `attribute_closer` | -- | `attribute_opener`, `attribute_closer`, `nested_attributes` (`array int closer>`) | Yeah, not all that consistent, but never mind that for now. | | **Comments** | **Opener: `comment_closer`, Closer: `comment_opener`** | -- | -- | | _Unless otherwise annotated, these indexes will all be integer stack pointers._

Planning

While this isn't necessarily a breaking change, I still propose to include the change in the 4.0 release for safety reasons, as there may be sniffs which explicitly only expect a comment_opener/comment_closer index on the T_DOC_COMMENT_CLOSE_TAG/T_DOC_COMMENT_OPEN_TAG tokens respectively and those sniffs may need to be adjusted.

Opinions ?

Please let me know if you have any concerns about this proposal or any suggestions for further enhancements to the DocBlock tokenization.

/cc @asispts @dingo-d @fredden @GaryJones @greg0ire @kukulich @michalbundyra @Ocramius @sirbrillig @stronk7 @weierophinney @wimg