AlDanial / cloc

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.
GNU General Public License v2.0
19.03k stars 1.01k forks source link

Support for git commits #205

Closed SeanCurtis-TRI closed 6 years ago

SeanCurtis-TRI commented 7 years ago

I'm liking this tool, and the --diff feature is nice as well. However, the diff I'd really like to operate on is the diff between two git commits. Is there some secret way to pipe a diff directly into cloc or to provide to git commits SHAs as the basis for the computation?

AlDanial commented 7 years ago

First thing that came to mind: "now why didn't I think of that?" I implemented this capability with commit 9c0c296. Let me know if it doesn't work as you expected.

SeanCurtis-TRI commented 7 years ago

Thanks for the amazing service. I love this work and am eager to use it regularly. I hit a number of roadblocks; you can skip all of this post and go to the bottom to see the summary of issues. Otherwise, put on your seat belt and enjoy the ride.

First try cloc --diff master HEAD

1 error:
Unable to read:  HEAD

Second try cloc --diff master PR_geometry_system <== the name of a branch

1 error:
Unable to read: PR_geometry_system

Third try cloc -by-file --diff master b9409db3a584effacc

git diff --name-only master | tar cf /tmp/e_G88PXhEu.tar -T -
git diff --name-only b9409db3a584effacc | tar cf /tmp/q5ub4evYss.tar -T -
      11 text files.
      11 text files.
       2 files ignored.                             

github.com/AlDanial/cloc v 1.73  T=0.22 s (4.5 files/s, 4.5 lines/s)
--------------------------------------------------------------------------------------------------------------
File                                                                     blank        comment           code
--------------------------------------------------------------------------------------------------------------
drake/geometry/geometry_system.h
 same                                                                        0            198             47
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
drake/geometry/query_handle.h
 same                                                                        0             23             21
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
drake/geometry/geometry_instance.h
 same                                                                        0              7             12
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
drake/geometry/geometry_system.cc
 same                                                                        0             17            101
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
drake/geometry/test/geometry_system_test.cc
 same                                                                        0             50            167
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
drake/geometry/test/expect_error_message.h
 same                                                                        0              6             35
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
drake/geometry/geometry_frame.h
 same                                                                        0             17             13
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
drake/geometry/geometry_instance.cc
 same                                                                        0              1              6
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
drake/geometry/geometry_query_results.h
 same                                                                        0             11             16
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
drake/geometry/geometry_ids.h
 same                                                                        0              4              9
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
--------------------------------------------------------------------------------------------------------------
SUM:
 same                                                                        0            334            427
 modified                                                                    0              0              0
 added                                                                       0              0              0
 removed                                                                     0              0              0
--------------------------------------------------------------------------------------------------------------

However, then I execute this:

git diff --stat master PR_geometry_system

 drake/geometry/BUILD                        |  65 +++++
 drake/geometry/geometry_frame.h             |  39 +++
 drake/geometry/geometry_ids.h               |  19 ++
 drake/geometry/geometry_instance.cc         |  10 +
 drake/geometry/geometry_instance.h          |  24 ++
 drake/geometry/geometry_query_results.h     |  31 +++
 drake/geometry/geometry_system.cc           | 248 +++++++++++++++++
 drake/geometry/geometry_system.h            | 473 +++++++++++++++++++++++++++++++++
 drake/geometry/query_handle.h               |  56 ++++
 drake/geometry/test/expect_error_message.h  |  51 ++++
 drake/geometry/test/geometry_system_test.cc | 253 ++++++++++++++++++
 11 files changed, 1269 insertions(+)

So, this points out several issues:

  1. So, it recognizes "master" but doesn't recognize other tags/branch names. So, you would probably want to document that you depend on actual SHAs.
  2. When I finally found the concatenation that works:
    • It reports 11 found but two files ignored. But in the per-file results, it provides data for 10 files. So, the number of files ignored seems to be in error.
    • In this particular diff, all of the changes are adding new files. So, I would expected the cloc numbers to match the git diff --stat values, and you can see they are off by a significant margin.
AlDanial commented 7 years ago

Actually, I did update the usage block to say that it accepts git commit hashes (as opposed to any description that git understands) as inputs. "master" was accepted as these six characters satisfy the regex cloc uses to determine if it is a git hash (kind of funny; accidentally does the right thing). My initial concern was distinguishing between file/directory names and git descriptors. I suppose that isn't an issue since I only pass git anything that isn't readable on the file system. I'm not sure that doing the opposite--passing anything that isn't readable as a file/dir on to git--is right either though. The most common cause for that is user error where the input is misspelled or doesn't exist.

I need to think about a clean solution a bit more.

Of your 11 files, I can see right away that the file BUILD will be ignored as this does not resemble any programming language that cloc knows about. To see what the other ignored file is, rerun with cloc --ignored ign.txt --by-file --diff master b9409db3a584effacc then look in the ign.txt file.

AlDanial commented 7 years ago

Hmmm, since you ran with --by-file it is pretty easy to see which files were counted. I see that only BUILD isn't in the output so cloc is seeing a file git isn't reporting, may be a dot file or something like that.

SeanCurtis-TRI commented 7 years ago

I had seen the updated documentation. I felt it was ambiguous because git allows commits to be referred to in so many ways. But you are correct, it does explicitly refer to "hashes".

As for the "ignored" file. I know there are only 11 files. And that only one of them should be ignored. However, I followed your advice and this is the contents of the ignore file:

/tmp/61EvP3NzGP/drake/geometry/BUILD: language unknown (#3)
/tmp/sFzQBtW15d/drake/geometry/BUILD: language unknown (#3)

It seems that the BUILD file is counted twice. One in each source.

Any thoughts on the counts being so different.

AlDanial commented 7 years ago

Yes, there is a counting error. BUILD is ignored once for each batch of input. My take is the fault is in the reported number of files found; it says it found 11 but really it found 22, 11 in each batch. The ignored count of 2 is right.

SeanCurtis-TRI commented 7 years ago

The ignored count of 2 is right.

For a given value of "right". :) It's certainly confusing in light of the other count (11). The 11 would make intuitive sense; the 22 would be implementation correct. Based on that, I'd hope for 2 becoming 1.

As for the counts being different, I also mean the counts in the per-file data. It reports zeros for all blank lines. And where it reports non-zero values (comment and code), it puts them in the "same" category. So, they should all be "added".

AlDanial commented 7 years ago

If BUILD were removed from the git branch but kept in the master, cloc would say 1 file was ignored. If BUILD and BUILD2 were in the branch, cloc would say 3 files were ignored. Any other way of looking at it doesn't make sense to me.

Anyway. Yes, the real counts of code/comments/blank are incorrect. Dang. I can duplicate the failure with my own repos so will start debugging.

AlDanial commented 7 years ago

With cb146d6 you should see code/comment/blank numbers that make more sense. The earlier code was making two identical tar files by pulling code from the currently active head (hence everything showing as "same"). The fix actually pulls code from the specified branch.

However, the fix relies on bash/sh/ksh expression evaluation and won't work on csh/tcsh. Still looking for a general solution.

SeanCurtis-TRI commented 7 years ago

I pulled the newest version. In running:

cloc --diff d846f2a cb146d (on the cloc repository),

It produced the following output:

git archive -o /tmp/0at2VBL65i.tar cb146d $(git diff --name-only cb146d)
git archive -o /tmp/GW6Vbg1YiV.tar d846f2a $(git diff --name-only d846f2a)
     288 text files.
       1 text file.s
      18 files ignored.                                         

github.com/AlDanial/cloc v 1.73  T=0.54 s (1.8 files/s, 1.8 lines/s)
---------------------------------------------------------------------------------------
Language                             files          blank        comment           code
---------------------------------------------------------------------------------------
PO File
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              9             18             33
XQuery
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              1              1
Drools
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              7             16             28
TTCN
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             11             16             19
Elixir
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              3             10              7
Visual Basic
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              4              2              6
Fortran 90
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2              1              5              7
Windows Module Definition
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              1              1             18
Clean
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             10             30             58
Freemarker Template
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              2             27
Nim
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              5             13             43
Assembly
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2             40            110            197
Mustache
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2              5              7             31
Fortran 77
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2              1              8              7
Ruby
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             11             30            111
Stata
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              7              7             22
make
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 4             85            157            242
Pascal
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 4              4             15             18
F#
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              3              6             14
Logtalk
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             59             57            368
Antlr
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             48             19            257
Python
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2              7             18              4
COBOL
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 3              5              8             35
PL/I
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              7              5
Haml
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              5             16             66
TypeScript
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 3             52             39            410
Qt Linguist
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              4             57
Glade
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0             22            232
Mathematica
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2             24             17             22
Slim
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              3             10
Swift
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             23             13             65
Lua
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              3              9              2
Markdown
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1            220             26           2136
RobotFramework
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              9              5             35
C
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 4            105             59            339
Dockerfile
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              4              1             53
R
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 3             95            312            698
C/C++ Header
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1            191            780            617
YAML
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                               137              1            137           2807
C++
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 4            132            173            570
XSLT
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2              0              4             19
JavaScript
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 4              0              0              4
PHP
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2             11             13             26
Tcl/Tk
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              1              2              3
Pig Latin
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             19             40             15
Haxe
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             26             99             24
Windows Resource File
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             42             45            218
Windows Message File
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2             89              9            348
Puppet
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              2              2             27
Verilog-SystemVerilog
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              4             20             62
Julia
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              3             11              4
Blade
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             10              5             22
DOS Batch
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              1              2              2
Forth
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2             17             84            529
Specman e
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2              4             12             31
Smalltalk
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2             19              5             85
Racket
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             32            159            247
IDL
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              2              1
ECPP
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             26             34            116
xBase
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              9              1
MUMPS
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              2              1
Kotlin
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              3              9
MATLAB
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2              0              1             50
Lisp
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              5             26             24
Razor
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              4              4
F# Script
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              1              2              8
GraphQL
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              1              2             14
Solidity
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              2             19
Haskell
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 4             23             26             35
MXML
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             23              5             74
TeX
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             29             21            155
Vuejs Component
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             10              2             85
Brainfuck
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              1              3             24
Bourne Shell
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 2              0              0              2
XML
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              2              3
Objective C
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             11             11             25
C#
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 3              8              7             23
JSON
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              0             22
Cucumber
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              3              2             28
GLSL
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             10             14             32
Java
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              6             15              9
Focus
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              1              2              1
BrightScript
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              3             19
INI
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              2              3              7
Groovy
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              0              2             17
ColdFusion
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              1              2              2
Mako
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1              3              8              9
LFE
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             15             21             25
Perl
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   1            778           1246           9946
 removed                                 5           1324           2301          19250
Sass
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   0              0              0              0
 removed                                 1             14              0             43
---------------------------------------------------------------------------------------
SUM:
 same                                    0              0              0              0
 modified                                0              0              0              0
 added                                   1            778           1246           9946
 removed                               270           2987           5228          31480
---------------------------------------------------------------------------------------

You should be able to reproduce this easily on your side by operating on this repository. (FTR, I'm running bash in a generic gnome terminal in Ubuntu.)

SeanCurtis-TRI commented 7 years ago

BTW I had a random thought this morning. Right now, you've overloaded the parameters so that my git hashes could be directories or files, etc. If it helps, I would not be adverse to having a specific flag that provides explicit semantics. I.e., if I provided a flag that says "interpret these as git commits", I would imagine that would give us full support for branch names and tags. Yes?

AlDanial commented 7 years ago

Yeah, this is getting trickier the deeper I get. The output of cloc --diff d846f2a cb146d you showed above is clearly wrong--what I was missing was that we just want the diffs of the files which were changed/added/removed relative to the two commits. What I was originally doing was comparing the file list of what changed in the first one (d846f2a) against the file list of what changed in the second (cb146d).

Anyway the latest commit cb146d applies this relative diff bit also introduces a new problem: git archive produces a zero sized tar file if asked to save a file which isn't in its inventory at a particular commit (ie, the file was added by the other commit).

The solution to this, as well as the csh problem, is to make Perl handle the intermediate steps, that is, break apart the git archive command into a bunch of smaller pieces. A drag.

SeanCurtis-TRI commented 7 years ago

This is representative of why I went looking for someone else who'd solved this problem for me. :) I didn't realize how ugly it was, I just felt it was uglier than I wanted to tackle. I'd pitch in, but I'm largely shell-script and perl ignorant. So, other than being a cheerleader on the sideline and a willing guinea pig, there's not a great deal I can do to help. But, again, thanks for doing this.

AlDanial commented 7 years ago

You're underestimating the value you've already provided. First off, it was a genius idea. Second, your volunteered role as guinea pig may get annoying as it will likely take a few more iterations before this works correctly.

During the week I generally have little time to work on cloc. This is one feature I want to get right though so things are going to move slowly for a while.

SeanCurtis-TRI commented 7 years ago

I'm glad I can serve.

I can certainly appreciate the whole work week issue. I also have repositories that don't get much love until the weekend.

AlDanial commented 7 years ago

6942a1e adds shell independence and smarter logic, but your example of cloc --diff d846f2a cb146d still doesn't give the correct result. The baffling thing is the underlying git archive command doesn't produce the output I'm expecting. cloc tells git to make two tar files, one for each commit, something like

  archive -o A.tar d846f2a cloc
  archive -o B.tar cb146d cloc

My expectation is that A.tar will contain a copy of cloc after commit d846f2a and B.tar will have cloc after commit cb146d. However the content of both tar files is identical! On the other hand, if I do

  git --diff d846f2a cb146d

git clearly shows the file cloc is different. Driving me nuts. What am I missing?

SeanCurtis-TRI commented 7 years ago

Two issues: first section deals with your assumption of what git archive should give you. Second is what is being reported on diffs in general:

Git archive

I did a simple test:

  1. Created a new repository.
  2. Added a hello world C++ file.
  3. Committed it to master.
  4. Branched off of master -> add_comment.
  5. Added a comment and an add'l print statement (committing to add_comment).
  6. git archive -o master.tar master
  7. git archive -o branch.tar add_comment
  8. cloc --diff master.tar branch.tar
http://cloc.sourceforge.net v 1.60  T=0.01 s (80.9 files/s, 80.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++
 same                            0              0              0              5
 modified                        1              0              0              0
 added                           0              0              1              1
 removed                         0              0              0              0
-------------------------------------------------------------------------------
SUM:
 same                            0              0              0              5
 modified                        1              0              0              0
 added                           0              0              1              1
 removed                         0              0              0              0
-------------------------------------------------------------------------------

Correct results! One comment and one code line added. So, that clearly works.

If I do the same thing in the cloc repository:

  1. git archive -o A.tar d846f2a cloc
  2. git archive -o B.tar cb146d cloc
  3. cloc --diff A.tar B.tar
    -------------------------------------------------------------------------------
    Language                     files          blank        comment           code
    -------------------------------------------------------------------------------
    Perl
    same                            0              0           1246           9943
    modified                        1              0              0              1
    added                           0              0              2              2
    removed                         0              0              0              2
    -------------------------------------------------------------------------------
    SUM:
    same                            0              0           1246           9943
    modified                        1              0              0              1
    added                           0              0              2              2
    removed                         0              0              0              2
    -------------------------------------------------------------------------------
  4. git diff --stat d846f cb146
    cloc | 4 +++-
    1 file changed, 3 insertions(+), 1 deletion(-)

Then I am getting results that are at least on the same order of magnitude (so, it's not counting all of the lines). So, I conclude one of two things:

1) If you're not getting the tarballs you expect, then you're not invoking what you believe, or 2) You're using the wrong criteria to judge the tarballs. They may be right, but what you're doing with them may be wrong. (And on that note...)

Diff evaluations Given the results listed just above, I'm not entirely sure what the mapping is, however. This is the actual diff:

--- a/cloc
+++ b/cloc
@@ -3669,7 +3669,9 @@ sub replace_git_hash_with_tarfile {          # {{{1
                 next;
             }
             my ($Tarfh, $Tarfile) = tempfile(UNLINK => 1, SUFFIX => '.tar');  # dele
-            my $cmd = "git diff --name-only $file_or_dir | tar cf $Tarfile -T -";
+#           my $cmd = "git diff --name-only $file_or_dir | tar cf $Tarfile -T -";
+            # next line won't work on csh/tcsh
+            my $cmd = "git archive -o $Tarfile $file_or_dir \$(git diff --name-only 
             print  $cmd, "\n";
             system $cmd;
             push @replacement_arg_list, $Tarfile;

We can see we modified one line (by adding the comment), and added two lines. So, I'm not entirely sure where the 2 "removed" lines being reported come from.

AlDanial commented 7 years ago

Thanks for the clear break-down, helped me see that cloc is actually doing the right thing now--without any more code changes:

> cloc --diff d846f2a cb146d
git archive -o /tmp/6QAQU3XzBc.tar d846f2a  cloc
git archive -o /tmp/npWCGuARU7.tar cb146d cloc
       1 text file.
       1 text file.
       0 files ignored.                             

github.com/AlDanial/cloc v 1.73  T=1.52 s (0.7 files/s, 0.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Perl
 same                            0              0           1246           9945
 modified                        1              0              0              1
 added                           0              0              2              0
 removed                         0              0              0              0
-------------------------------------------------------------------------------
SUM:
 same                            0              0           1246           9945
 modified                        1              0              0              1
 added                           0              0              2              0
 removed                         0              0              0              0
-------------------------------------------------------------------------------

Probably too many hours staring at the screen yesterday made me partially blind to what I was trying to see. Today, with fresh eyes, all looks good.

At this point I should describe (and add to the README.md after this feature goes into production), the logic cloc employs if it gets what looks like a commit hash as an input:

There's a lot going on so I'd be grateful for more tire-kicking.

SeanCurtis-TRI commented 7 years ago

I was able to confirm that I'm getting the same results as you on the example above.

However, when I try it on my own repository, I get something completely different. My git diff --stat reports 8 files changed. When I run cloc on those same shas, it counts up thousands of files. The final report counts hundreds of thousands of lines of code and comments that are the same. But when it comes to counting modified and added, the numbers are on the right scale:

cloc (blank + comment + code = total)

git diff:

The difference is largely due to a) one file included in git excluded in cloc (an unrecognized type), and b) some accounting issues. Other than that, the file counts are correct.

I'd understood that it wouldn't consider/count/crunch through files that weren't different between the two commits. Is that mistaken?

AlDanial commented 7 years ago

Details on what happens with --difff on two hashes: First it makes file listings from each hash for both full and the diff versions via (this is pseudocode rather than Perl):

Left_Full_List  = git ls-tree --name-only -r LEFT_HASH
Right_Full_List = git ls-tree --name-only -r RIGHT_HASH
Left_Diff_List  = git diff-tree --no-commit-id --name-only LEFT_HASH
Right_Diff_List = git diff-tree --no-commit-id --name-only RIGHT_HASH

Next it makes a list which is a union of both diffs

Both_List =union( Left_Diff_List, Right_Diff_List)

If files have been added or deleted between the two commits, it is possible that Both_List will include one or more file names that don't exist in one of the two repos. cloc then makes trimmed-down versions of Both_List that contain only files that actually exist at that commit level.

Both_List_Left  = intersection(Both_List, Left_Full_List)
Both_List_Right = intersection(Both_List, Right_Full_List)

Next, it makes tar files of the left and right hashes using the files in the unioned and trimmed lists:

git archive -o Left_tar_file  LEFT_HASH  Both_List_Left 
git archive -o Right_tar_file  RIGHT_HASH  Both_List_Right

Finally, it does

cloc --diff Left_tar_file Right_tar_file

If you manually walk through the git commands with your hashes you should be able to duplicate the cloc results. Of course, if the logic is flawed I want to know.

SeanCurtis-TRI commented 7 years ago

Sorry for my slow response.

I have some concerns about git diff-tree.

  1. It seems you've mistakenly omitted the -r flag. Without the recursion flag, I'm not getting any files that aren't in the root directory of the repository. That said, I tried adding it to cloc on line 3704 and it made no difference to the final output. Hmmm....
    1. What it does include without the -r flag is the name of a directory in the project root that contains changes.
    2. I suspect, that when we take the intersection of *_Full_List and Both_List we end up with an empty list. So, when we provide that empty list as the PATH argument to git archive we are, in fact, not specifying individual files; we're implicitly saying grab it all.
  2. By specifying only a single hash, you are creating a diff of that hash with its immediate parent. i.e., git diff-tree HEAD and git diff-tree HEAD HEAD~1 should produce exactly the same results. I think what you really want is git diff-tree LEFT_HASH RIGHT_HASH to get the differences between the two SHAs. This should eliminate the need for the union (as this should include additions, modifications, and removals).
SeanCurtis-TRI commented 7 years ago

FTR I put in print statements at the calls to git_archive to see the list of files being passed into the argument; both printed as empty lists.

AlDanial commented 6 years ago

I found the cause of the empty file lists--I'd neglected to trim newlines from one of the git inputs. In any event, your suggestion on diff-tree LHASH RHASH is good and I implemented that in my latest commit. The old logic is commented out.

Additionally, if you run with -v it will now print out all the git commands it issues under the hood.

Let me know if these changes bring any improvement to your runs.

SeanCurtis-TRI commented 6 years ago

Every day leads to an improvement. I have a couple notes:

  1. You're still missing a flag on the git-diff invocation. You need to pass the -r flag. I modified line 3735 to: `my $git_list_cmd = "git diff-tree -r --no-commit-id --name-only $Left $Right"; and my output improved immensely.

For example, for my own particular HEAD/master comparison:

The final output of cloc is even more compelling:

  1. Without the -v flag, it looks like the git archive command is still printing verbose. line 3799
AlDanial commented 6 years ago

d2b57f7 corrects both issues. I'm pleased the new capability is coming together. I'll release the next stable version, 1.74, once this looks solid.

SeanCurtis-TRI commented 6 years ago

So, I'm hammering on this and I have to come back to a fleeting comment.

You're very admirably trying to infer git operands on the --diff processing option. It works well with something like:

cloc --diff master HEAD

However, if I want to compare arbitrary branches, it does not work so well, e.g.,

cloc --diff branch1 branch2

Which would produce an error like:

       0 text files.
       0 text files.
       0 files ignored.                             

2 errors:
Unable to read:  branch1
Unable to read:  branch2
Nothing to count.

Life may be a lot easier if you go from inference to declaration and actually add a --force-git flag that allows me to say, indisputably, that I expect these arguments to be interpreted as git hashes.

SeanCurtis-TRI commented 6 years ago

I've submitted a pull request (#215) illustrating how I envision the flag. Without looking at the big picture, I know that:

cloc --diff --git branchA branchB

now does what I want.

AlDanial commented 6 years ago

Thanks for the PR; makes sense. The modified subroutine happens early on so --git is acted upon before cloc decides to do a straight count or a diff, which is good. I'm ready to close this issue if you think it is fully baked.

SeanCurtis-TRI commented 6 years ago

My only hesitation in the PR was if the documentation was sufficient. I hadn't tested it w.r.t. other contexts in which the intput could/should be interepreted as a git hash. At the very least, the documentation could be tweaked to indicate that it should only be used in conjunction with --diff.

AlDanial commented 6 years ago

I plan to release 1.74 this Friday, Sept 8. If you have any more tweaks or improvements for the new git capabilities let me know.

SeanCurtis-TRI commented 6 years ago

If you're content, I'm content. I certainly have it working for me. :) Shall we close the issue?

AlDanial commented 6 years ago

Version 1.74 was released with these new git features.