AlDanial / cloc

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.
GNU General Public License v2.0
19.75k stars 1.02k forks source link

file filter (--match-f) has a strange "memory effect" #853

Closed aladur closed 2 months ago

aladur commented 2 months ago

Describe the bug Sorry for once again bothering You, but this bug influences testing the other bug #851 . It can happen that after using a file filter (with --match-f) and on the next run use cloc without --match-f there is still a file filter applied (not 100% reproducible, I didn't find out where and when the "memory effect" is stored, tried with --sdir, without success).

cloc; OS; OS version

Step1: apply an "all inclusive" file filter

$ ~/temp/bin/cloc --match-f='^.*$' .

Output:

     597 text files.
     378 unique files.                                          
     479 files ignored.

github.com/AlDanial/cloc v 2.03  T=0.25 s (1492.1 files/s, 560194.5 lines/s)
-----------------------------------------------------------------------------------
Language                         files          blank        comment           code
-----------------------------------------------------------------------------------
SVG                                 53              0              0          46239
C++                                118           6795           3983          45231
C/C++ Header                       138           3245           2849          10564
make                                20           1018            983           8290
Text                                13            235              0           4569
XML                                 17              1              0           3386
Qt                                   8              0              0           3359
C                                    7            119             95            838
Windows Resource File                2              6              0             72
JSON                                 1              0              0             26
Python                               1              3              1              6
-----------------------------------------------------------------------------------
SUM:                               378          11422           7911         122580
-----------------------------------------------------------------------------------

OK => All files listed, as requested.

Step2: apply no file filter

$ ~/temp/bin/cloc .

Output:

     278 text files.
     269 unique files.                                          
      27 files ignored.

github.com/AlDanial/cloc v 2.03  T=0.12 s (2297.4 files/s, 563472.6 lines/s)
-----------------------------------------------------------------------------------
Language                         files          blank        comment           code
-----------------------------------------------------------------------------------
C++                                102           6537           3640          30824
C/C++ Header                       130           2658           2793           8567
Text                                10            234              0           3898
Qt                                   8              0              0           3359
XML                                  8              1              0           2071
C                                    7            119             95            838
SVG                                  1              0              0            239
Windows Resource File                2              6              0             72
JSON                                 1              0              0             26
-----------------------------------------------------------------------------------
SUM:                               269           9555           6528          49894
-----------------------------------------------------------------------------------

FAIL => Without a file filter all files should be counted (right?), resulting in the same result as Step1. This output definitely depends on previous cloc runs. I often used the file filter --match-f='^./[a-z][a-z0-9]*.[a-z]+$' which seems to be the filter applied for this output under the hood. The question comes up, how to clear or even better how to avoid this "memory effect"?

Step3: apply a different file filter

$ ~/temp/bin/cloc --match-f='^f.*$' .

Output:

     229 text files.
     136 unique files.                                          
     252 files ignored.

github.com/AlDanial/cloc v 2.03  T=0.15 s (915.2 files/s, 552605.9 lines/s)
-----------------------------------------------------------------------------------
Language                         files          blank        comment           code
-----------------------------------------------------------------------------------
SVG                                 53              0              0          46239
C++                                 35           2200           1034          23914
C/C++ Header                        31            870            583           3312
XML                                  9              1              0           1932
Qt                                   4              0              0           1876
Windows Resource File                2              6              0             72
C                                    1             13             22             35
Python                               1              3              1              6
-----------------------------------------------------------------------------------
SUM:                               136           3093           1640          77386
-----------------------------------------------------------------------------------

OK = >File filter applied as requested.

Step4: apply no file filter

$ ~/temp/bin/cloc .

Output:

     278 text files.
     269 unique files.                                          
      27 files ignored.

github.com/AlDanial/cloc v 2.03  T=0.12 s (2327.6 files/s, 570883.1 lines/s)
-----------------------------------------------------------------------------------
Language                         files          blank        comment           code
-----------------------------------------------------------------------------------
C++                                102           6537           3640          30824
C/C++ Header                       130           2658           2793           8567
Text                                10            234              0           3898
Qt                                   8              0              0           3359
XML                                  8              1              0           2071
C                                    7            119             95            838
SVG                                  1              0              0            239
Windows Resource File                2              6              0             72
JSON                                 1              0              0             26
-----------------------------------------------------------------------------------
SUM:                               269           9555           6528          49894
-----------------------------------------------------------------------------------

FAIL => Resulting in the exact same numbers as in Step2 although no file filter requested. The under the hood memory does not get cleaned up by applying a different file filter.

AlDanial commented 2 months ago

I'm unable to reproduce a "memory effect" resulting in count variations. Can you find a public git repo, or post a tar of files, so we can work with the same inputs?

Also try a run of Steps 1 & 2 with the extra switches --found=Found.txt --counted=Count.txt --ignored=Ignored.txt to see what exactly was found, counted, ignored. Then diff the results from the two runs, perhaps that will shed light.

aladur commented 2 months ago

... Sorry this was my fault. Due to all the testing I forgot that still a file $HOME/.config/cloc/options.txt existed applying a default file filter.