checkstyle / checkstyle

Checkstyle is a development tool to help programmers write Java code that adheres to a coding standard. By default it supports the Google Java Style Guide and Sun Code Conventions, but is highly configurable. It can be invoked with an ANT task and a command line program.
https://checkstyle.org
GNU Lesser General Public License v2.1
8.28k stars 3.65k forks source link

InputEqualsHashCodeEqualsParameter.java has wrong location of violation comments #15256

Open romani opened 1 month ago

romani commented 1 month ago

detected at https://github.com/checkstyle/checkstyle/pull/15021

InputEqualsHashCodeEqualsParameter.java has wrong location of violation comments.

https://github.com/checkstyle/checkstyle/blob/6d8f8f2f9064ab7706fd5c1dd5c715fe0b4ac599/src/test/resources/com/puppycrawl/tools/checkstyle/checks/coding/equalshashcode/InputEqualsHashCodeEqualsParameter.java#L15-L16

violation is actually only on method, not on class.

Reason is wrong implementation of regexp to match comment wihtout message but with some words as reasons or explanation example: // violation, no correctequals` vs comment with violation message // violation 'without .* of 'equals()'.'`

Sugested regexp: ".*//\\s*violation,\\s+(?:.*)?$" from https://github.com/checkstyle/checkstyle/pull/15021#discussion_r1676942913

Expected: update remove bdd/InlineConfigParser.java to match to // violations, bla bla bla. Remove extra comments from this InputEqualsHashCodeEqualsParameter file.

If there is "violation" word and this line does not match any pattern - throw exception.

rnveach commented 1 month ago

We need some examination of why we aren't catching this and how we can avoid these type of things in the future. I tried a // violation in Google's IT, and it picked it up as a missing violation.

romani commented 1 month ago

Reason is at https://github.com/checkstyle/checkstyle/pull/15021#discussion_r1657178202

There are multiple matchers to trailing comments, we should match to VIOLATION_WITH_EXPLANATION_PATTERN But content of such regexp is wrong, not matching to name of field , just a defect.

As there was no usage of it, it wasn't detected.

After fix in referenced PR, it will be in use, so if any other regression happens, build failure will happen in test.

rnveach commented 1 month ago

Reason is at https://github.com/checkstyle/checkstyle/pull/15021#discussion_r1657178202

I disagree. Link is why current expression doesn't match the violation text, but it is not why we missed the violation comment completely.

We should identify anything written as // violation as a violation and then fail with an error if we can't match it with a specific message like VIOLATION_WITH_EXPLANATION_PATTERN. We shouldn't have violation comments go completely unnoticed as it did to create this issue.

romani commented 1 month ago

Ok, this is slightly different issue with bigger scope .

@SteLeo1602 , please extend this sequence of matchers with one more matcher at the end that match any comment with //\s*violation .* it will be default, it should be at the end .

rnveach commented 1 month ago

//\s*violation .*

It may be better if we make this catch as much as possible like //.*violation.*. Some of our violation messages don't start with "violation" and are more in the middle.

romani commented 1 month ago

@nrmancuso , are you ok to do broad catch of 'violation' word in Input comments ?

nrmancuso commented 1 month ago

@nrmancuso , are you ok to do broad catch of 'violation' word in Input comments ?

If you mean https://github.com/checkstyle/checkstyle/issues/15256#issuecomment-2231354613, yes. Ideally we match all things violation, then figure out which it is, or throw an exception if it doesn’t match.

romani commented 1 month ago

Ok, we should catch all lines that has 'violation' in it.

Zopsss commented 1 month ago

Some of our violation messages don't start with "violation" and are more in the middle.

can you point out some of this types of messages? I'm not able to find it. And if we're gonna use this regex then we won't be able to use violation word in any single line comment unless the comment is actually about violation, we can face problem like this: https://github.com/checkstyle/checkstyle/pull/15327#discussion_r1692501410 @rnveach

romani commented 1 month ago

Workaround is multiline comment /* bla bla violation bla*/

rnveach commented 1 month ago

Some of our violation messages don't start with "violation" and are more in the middle. can you point out some of this types of messages?

All trigger regular expressions are in InlineConfigParser. We have triggers like \d+ violations above and filtered violation below.

Zopsss commented 1 month ago

We have triggers like \d+ violations above

We have regex for such types of violations:

https://github.com/checkstyle/checkstyle/blob/e4c87fc705e81747f634fcbb52278549e208422e/src/test/java/com/puppycrawl/tools/checkstyle/bdd/InlineConfigParser.java#L103-L113

and filtered violation below.

we have regex for these type violations too:

https://github.com/checkstyle/checkstyle/blob/e4c87fc705e81747f634fcbb52278549e208422e/src/test/java/com/puppycrawl/tools/checkstyle/bdd/InlineConfigParser.java#L115-L125

I went through all the comments, I'm not able to understand why we need an extra regex at the end of the matcher as @romani suggested here: https://github.com/checkstyle/checkstyle/issues/15256#issuecomment-2231750552.

We just need to detect violation comments like: // violation , some explanation, we can do this by modifying existing regex or add a new regex for it. This is all we need right? or am I missing something?

are you ok to do broad catch of 'violation' word in Input comments ?

I'm not able to understand why there's a need to broad catch of violation word in every single line comment. We have regex for every kind of violations messages that we use, if any new type of message is introduced then it regex should be created, no need to broad catch the violation word like this.

rnveach commented 1 month ago

We have regex for such types of violations We have regex for every kind of violations messages that we use

Yes, but look at how we process these. We do blind matching.

What if something doesn't match when it should? Then we get this issue. Basically someone wrote a violation in their own format which we didn't support. We just decided to change an existing format as well to fix the issue.

We need double validation to ensure we catch everything we are suppose to.

nrmancuso commented 1 month ago

We have regex for such types of violations We have regex for every kind of violations messages that we use

Yes, but look at how we process these. We do blind matching.

What if something doesn't match when it should? Then we get this issue. Basically someone wrote a violation in their own format which we didn't support. We just decided to change an existing format as well to fix the issue.

We need double validation to ensure we catch everything we are suppose to.

I second this, but also I remember trying to write/remember this syntax and failed. If we throw an exception and point to where the formats we expect live, this is super helpful to contributors.

romani commented 1 month ago

Yes, we kind of make word so special , that users should not use in single line comments // . We are ready for this. It is better than missing violation or confusion.

There is a workaround with miltlelines comment /* so, in is not this bad.

If we throw an exception and point to where the formats we expect live, this is super helpful to contributors.

Let's postpone this to another issue, it needs more details. This issue is growing in scope to quick, I already had to help to resolve it, as it is complicated, let's merge small scope first.