codeclimate / codeclimate-duplication

Code Climate engine for code duplication analysis
http://codeclimate.com
MIT License
113 stars 24 forks source link

Similar code detection overly aggresive #49

Closed tomato42 closed 8 years ago

tomato42 commented 8 years ago

(I'm not sure if this is the right place to file this bug, please redirect me if it isn't)

So I have this analysis here: https://codeclimate.com/github/tomato42/tlslite-ng/compare/extensions-remove-similar#issues it says that there is similar code in two other places (definition of ClientCertTypeExtension, SupportedGroupsExtension and ECPointFormatsExtension classes).

I have had them previously defined as more regular classes, with just initializers, as you can see here: https://codeclimate.com/github/tomato42/tlslite-ng/compare/softer-deduplication-in-extensions (just a previous commit to the one above)

Now: while the code is similar, the values, documentation and behaviour is definitely not. So, either the similarity code detection is a little bit too trigger happy, or I'm missing some obvious way to deduplicate this code without loosing the documentation (which in this case is worth more than the code is).

And while I can always just ignore this, it makes the "grades" assessment rather useless, as code similarity, even between such simple parts, makes the rating drop significantly always.

pbrisbin commented 8 years ago

Hi @tomato42 thanks for reporting this, it's a fine place to do so.

We've seen this behavior ourselves and are continuing to try and get the default duplication thresholds right across the variety of languages we analyze. Understanding that's never going to be perfect, we also allow you to configure what the engine considers duplication via tuning the "mass threshold" in your own codeclimate.yml: https://github.com/codeclimate/codeclimate-duplication#threshold

We also have some additional configuration in the works, which you can see in this PR: https://github.com/codeclimate/codeclimate-duplication/pull/47.

I'd suggest trying a higher threshold value and seeing if that produces results more in line with what you'd want to enforce.

tomato42 commented 8 years ago

doesn't that mean I have to migrate to the new codeclimate platform?

pbrisbin commented 8 years ago

Oh, I'm sorry. I thought that, since this repo is the Platform duplication engine, that's what we were discussing.

You're right in that, to make use of the new configuration points I was talking about, you'd have to switch over to the new Platform. In our Classic analysis, we don't have any ability to configure how aggressive that duplication analysis is.

I'm going to close this issue, since this repo is specific to the new Platform. If you'd like to discuss further, and see if there's anything we can do, please reach out at codeclimate.com/help.

tomato42 commented 8 years ago

well, it looks like the set of engines available in the Platform is more or less on feature-parity for Python with the Classic, so I just bit the bullet.

The default works fine.