AlDanial / cloc

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.
GNU General Public License v2.0
19.61k stars 1.02k forks source link

Improve --fullpath and --match-d root path #731

Open PMExtra opened 1 year ago

PMExtra commented 1 year ago

Currently, using the --fullpath with --match-d options causes cloc to evaluate the regex based on the physical full path, but this might not align with what users expect.

For instance, if I aim to match ./foo/src/ exclusively, but excluding ./src/ or ./next/foo/src/, I must set --match-d="^$(pwd)/foo/src/".

Moreover, when working with git diff, it becomes quite a hassle. The diff generates temporary directories with names that are hard to retrieve.

I propose aligning cloc's working path as a virtual root to address these issues, much like git does.

Then we can use match-d=^/foo/src/ to match paths relative to the project.

martinvonwittich commented 11 months ago

I also just stumbled over this and was totally confused how to anchor the regex - I wanted to exclude ./path/to/foo/, but had no success with --not-match-d ^\./path/to/foo, ^/path/to/foo and ^path/to/foo. I was also confused by --fullpath; the manpage says:

       --not-match-d=REGEX
           Count all files except in directories matching the Perl regex.
           Only the trailing directory name is compared, for example, when
           counting in "/usr/local/lib", only "lib" is compared to the regex.
           Add --fullpath to compare parent directories to the regex.  Do not
           include file path separators at the beginning or end of the regex.

But --fullpath claims only to influence match-f, and doesn't say anything about match-d:

       --fullpath
           Modifies the behavior of --match-f or --not-match-f to include the
           file's path in the regex, not just the file's basename.  (This does
           not expand each file to include its absolute path, instead it uses
           as much of the path as is passed in to cloc.)

And funnily enough, --match-d='path/to/foo without --fullpath excludes everything outside of /path/to/foo, so apparently it does work without --fullpath?

To figure out how to anchor my regex, I had to put some debug code into cloc to see what's actually being matched here:

    if ($opt_match_d    ) { use DDP; p $Dir; return unless $Dir =~ m{$opt_match_d}     }

and I was dismayed to learn that it matches the absolute path, including the path to my home directory.

AlDanial commented 11 months ago

I agree this needs work (both the code and the documentation). Basing --fullpath off the directory from which cloc was invoked sounds reasonable. Many switches will be affected so a fix will be a while in coming.

AlDanial commented 11 months ago

@PMExtra The intent of --fullpath was always meant to be relative to the directory from which cloc was run. In that sense it is poorly named; --filepath would have been a better choice. I still haven't finished testing the --match-d branches, so the current commit may not interest you.

@martinvonwittich There was a problem with how --not-match-d was implemented. Your combination --not-match-d ^\./path/to/foo should have given the behavior you desired . Please try 18062b5 to see if --not-match-d works as you expect.

AlDanial commented 11 months ago

@PMExtra try 39f3b9e. With that fix --match-d="^\./foo/src/" should give you the behavior you want (without --fullpath).