AlDanial / cloc

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.
GNU General Public License v2.0
19.75k stars 1.02k forks source link

--not-match-d= includes file at root of pattern #833

Closed njajones closed 5 months ago

njajones commented 6 months ago

Describe the bug --not-match-d= excludes a folders subfolders, but not a file in that folder

cloc; OS; OS version

To Reproduce

~ % mkdir Developer/clocTest
~ % mkdir Developer/clocTest/foo   
~ % mkdir Developer/clocTest/foo/bar
~ % echo "struct foo {\n}" > Developer/clocTest/foo/foo.swift 
~ % echo "struct bar { }" > Developer/clocTest/foo/bar/bar.swift 
~ % cd  Developer/clocTest 
clocTest % cloc . --fullpath --not-match-d="/foo/"

Expected result I'd expect to filter out the files at foo/foo.swift and foo/bar/bar.swift 0 text files response

But instead get

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Swift                            1              0              0              2
-------------------------------------------------------------------------------

i.e. the results for foo/foo.swift with only foo/bar/bar.swift filtered out

Additional context Actual result is the same as doing cloc . --fullpath --not-match-d="bar". Expected result can be achieved by doing cloc . --fullpath --not-match-d="foo" but for my actual real world use case I need to create a list of excludes i.e. cloc . --fullpath --not-match-d="/(root/example1|root/example2|root/example3)/" such that root/example1/file.swift would be excluded as well as root/example1/subfolder/file.swift This could just be a misunderstanding on how the regex works, as I've seen a number of other similar issues. However, I haven't found an answer that covers this case (AFAIK).

njajones commented 6 months ago

-fullpath is included above as it is needed (I believe) with my real world use case, but in the example above it makes no difference

njajones commented 6 months ago

The workaround in #732 also works in this case i.e. cloc . -fullpath --not-match-d="/foo(/|$)"

AlDanial commented 5 months ago

Yes, this is a bug. If you run find in the Developer directory you'll see the same text that cloc will work on:

~/Developer » find .
.
./clocTest
./clocTest/foo
./clocTest/foo/foo.swift
./clocTest/foo/bar
./clocTest/foo/bar/bar.swift

The ./clocTest/foo directory doesn't satisfy the /foo/ pattern due to the missing trailing slash so the foo directory is accepted, and subsequently--incorrectly--its child file foo.swift is also accepted.

I need to add another check when examining files to make sure the parent directory doesn't match --not-match-d. (The #732 workaround allows ./clocTest/foo to be rejected since it matches /foo$.)

AlDanial commented 5 months ago

Give ba6c1aa a try.

njajones commented 5 months ago

Works with my sample above. It's harder to test on my full project (as the use of cloc is quite deep in the code) but I'll give it a go

njajones commented 5 months ago

Fix works great. Thanks for the help

AlDanial commented 5 months ago

...and thanks for the contribution!