AlDanial / cloc

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.
GNU General Public License v2.0
19.25k stars 1.02k forks source link

Multiple --not-match-f override each other, instead of being applied additively #724

Closed includesec-erik closed 1 month ago

includesec-erik commented 1 year ago

Describe the bug When running the following command (and any similar command with more than one --not-match-f stated) the default expectation is that all files who have a basename ending with _test.go and _proto.go will not be included for counting consideration by cloc. The actual behavior of cloc seems to be that a single (I believe the last stated) not-match-f will be honored as a filter instead of all statements of --not-match-f

cloc . --not-match-f=".*\_proto.go" --not-match-f=".*\_test.go"

cloc; OS; OS version

To Reproduce See comment on this youtube video for repro: https://www.youtube.com/watch?v=eRLTkDMsCqs

Expected result All not-match-f filters are applied within cloc for filtering consideration instead of only one.

Thanks for considering this Al, perhaps we can change cloc's default behavior to be additive filter with this command line option instead of single filter respected? Apparently this unexpected behavior has been around a while!

BTW does this situation also apply to match-f, match-d, and not-match-d command line options as well?

AlDanial commented 1 year ago

None of the --match-* or --not-match-* switches may be repeated. I didn't see the need since a single regex can handle multiple cases. Your two --not-match-f cases can be condensed to

cloc . --not-match-f=".*\_(proto|test).go"

I'm sure I'm overlooking situations where multiple copies of --not-match-f really are necessary. If you can describe such a use case I'll update the code to accommodate it.

includesec-erik commented 1 year ago

Hi @AlDanial, thanks for the reply! Given your info, I'd categorize this as an enhancement request issue, not a bug.

You're correct in stating that all possible matches can be thought of and specified in a single regex, thanks for pointing that out.

I would say though that for users who are less regex experienced, or when I'm trying to explain to another party how to use cloc over email/phone call, it is tremendously simpler to use multiple parameters to build a list of filters. From what I've seen from working with other tech professionals who use other command line tools, this is a commonly expected pattern (additive list of filters) that works in other tools (Tokei for instance).

I totally understand if implementing this behavior change is a big ask why you might want to decline this enhancement request, but if it is a smaller ask, please consider it! Thank you.

AlDanial commented 1 year ago

It's not a big ask and I'm familiar with additive options (cloc's --force-lang and --script-lang can be specified multiple times). Still, the request will need to get on the back burner until I finish #722 (which will take me some time to implement cleanly).

includesec-erik commented 1 year ago

Sounds good @AlDanial Fight the good fight against Text::Glob!

AlDanial commented 1 year ago

I've begun work on this; try the latest commit to kick the tires on additive --not-match-f and --not-match-d

includesec-erik commented 1 month ago

@AlDanial I think you implemented this and released it in 2023 right? Should we close the issue since --not-match-f and --not-match-d are now additive?

AlDanial commented 1 month ago

An oversight! Yes, the fix was made more than a year ago. Always happy to close an issue.