Closed SeanCurtis-TRI closed 6 years ago
First thing that came to mind: "now why didn't I think of that?" I implemented this capability with commit 9c0c296. Let me know if it doesn't work as you expected.
Thanks for the amazing service. I love this work and am eager to use it regularly. I hit a number of roadblocks; you can skip all of this post and go to the bottom to see the summary of issues. Otherwise, put on your seat belt and enjoy the ride.
First try
cloc --diff master HEAD
1 error:
Unable to read: HEAD
Second try
cloc --diff master PR_geometry_system
<== the name of a branch
1 error:
Unable to read: PR_geometry_system
Third try
cloc -by-file --diff master b9409db3a584effacc
git diff --name-only master | tar cf /tmp/e_G88PXhEu.tar -T -
git diff --name-only b9409db3a584effacc | tar cf /tmp/q5ub4evYss.tar -T -
11 text files.
11 text files.
2 files ignored.
github.com/AlDanial/cloc v 1.73 T=0.22 s (4.5 files/s, 4.5 lines/s)
--------------------------------------------------------------------------------------------------------------
File blank comment code
--------------------------------------------------------------------------------------------------------------
drake/geometry/geometry_system.h
same 0 198 47
modified 0 0 0
added 0 0 0
removed 0 0 0
drake/geometry/query_handle.h
same 0 23 21
modified 0 0 0
added 0 0 0
removed 0 0 0
drake/geometry/geometry_instance.h
same 0 7 12
modified 0 0 0
added 0 0 0
removed 0 0 0
drake/geometry/geometry_system.cc
same 0 17 101
modified 0 0 0
added 0 0 0
removed 0 0 0
drake/geometry/test/geometry_system_test.cc
same 0 50 167
modified 0 0 0
added 0 0 0
removed 0 0 0
drake/geometry/test/expect_error_message.h
same 0 6 35
modified 0 0 0
added 0 0 0
removed 0 0 0
drake/geometry/geometry_frame.h
same 0 17 13
modified 0 0 0
added 0 0 0
removed 0 0 0
drake/geometry/geometry_instance.cc
same 0 1 6
modified 0 0 0
added 0 0 0
removed 0 0 0
drake/geometry/geometry_query_results.h
same 0 11 16
modified 0 0 0
added 0 0 0
removed 0 0 0
drake/geometry/geometry_ids.h
same 0 4 9
modified 0 0 0
added 0 0 0
removed 0 0 0
--------------------------------------------------------------------------------------------------------------
SUM:
same 0 334 427
modified 0 0 0
added 0 0 0
removed 0 0 0
--------------------------------------------------------------------------------------------------------------
However, then I execute this:
git diff --stat master PR_geometry_system
drake/geometry/BUILD | 65 +++++
drake/geometry/geometry_frame.h | 39 +++
drake/geometry/geometry_ids.h | 19 ++
drake/geometry/geometry_instance.cc | 10 +
drake/geometry/geometry_instance.h | 24 ++
drake/geometry/geometry_query_results.h | 31 +++
drake/geometry/geometry_system.cc | 248 +++++++++++++++++
drake/geometry/geometry_system.h | 473 +++++++++++++++++++++++++++++++++
drake/geometry/query_handle.h | 56 ++++
drake/geometry/test/expect_error_message.h | 51 ++++
drake/geometry/test/geometry_system_test.cc | 253 ++++++++++++++++++
11 files changed, 1269 insertions(+)
So, this points out several issues:
git diff --stat
values, and you can see they are off by a significant margin.Actually, I did update the usage block to say that it accepts git commit hashes (as opposed to any description that git understands) as inputs. "master" was accepted as these six characters satisfy the regex cloc uses to determine if it is a git hash (kind of funny; accidentally does the right thing). My initial concern was distinguishing between file/directory names and git descriptors. I suppose that isn't an issue since I only pass git anything that isn't readable on the file system. I'm not sure that doing the opposite--passing anything that isn't readable as a file/dir on to git--is right either though. The most common cause for that is user error where the input is misspelled or doesn't exist.
I need to think about a clean solution a bit more.
Of your 11 files, I can see right away that the file BUILD will be ignored as this does not resemble any programming language that cloc knows about. To see what the other ignored file is, rerun with
cloc --ignored ign.txt --by-file --diff master b9409db3a584effacc
then look in the ign.txt
file.
Hmmm, since you ran with --by-file
it is pretty easy to see which files were counted. I see that only BUILD
isn't in the output so cloc is seeing a file git isn't reporting, may be a dot file or something like that.
I had seen the updated documentation. I felt it was ambiguous because git allows commits to be referred to in so many ways. But you are correct, it does explicitly refer to "hashes".
As for the "ignored" file. I know there are only 11 files. And that only one of them should be ignored. However, I followed your advice and this is the contents of the ignore file:
/tmp/61EvP3NzGP/drake/geometry/BUILD: language unknown (#3)
/tmp/sFzQBtW15d/drake/geometry/BUILD: language unknown (#3)
It seems that the BUILD file is counted twice. One in each source.
Any thoughts on the counts being so different.
Yes, there is a counting error. BUILD
is ignored once for each batch of input. My take is the fault is in the reported number of files found; it says it found 11 but really it found 22, 11 in each batch. The ignored count of 2 is right.
The ignored count of 2 is right.
For a given value of "right". :) It's certainly confusing in light of the other count (11). The 11 would make intuitive sense; the 22 would be implementation correct. Based on that, I'd hope for 2 becoming 1.
As for the counts being different, I also mean the counts in the per-file data. It reports zeros for all blank lines. And where it reports non-zero values (comment and code), it puts them in the "same" category. So, they should all be "added".
If BUILD
were removed from the git branch but kept in the master, cloc would say 1 file was ignored. If BUILD
and BUILD2
were in the branch, cloc would say 3 files were ignored. Any other way of looking at it doesn't make sense to me.
Anyway. Yes, the real counts of code/comments/blank are incorrect. Dang. I can duplicate the failure with my own repos so will start debugging.
With cb146d6 you should see code/comment/blank numbers that make more sense. The earlier code was making two identical tar files by pulling code from the currently active head (hence everything showing as "same"). The fix actually pulls code from the specified branch.
However, the fix relies on bash/sh/ksh expression evaluation and won't work on csh/tcsh. Still looking for a general solution.
I pulled the newest version. In running:
cloc --diff d846f2a cb146d
(on the cloc repository),
It produced the following output:
git archive -o /tmp/0at2VBL65i.tar cb146d $(git diff --name-only cb146d)
git archive -o /tmp/GW6Vbg1YiV.tar d846f2a $(git diff --name-only d846f2a)
288 text files.
1 text file.s
18 files ignored.
github.com/AlDanial/cloc v 1.73 T=0.54 s (1.8 files/s, 1.8 lines/s)
---------------------------------------------------------------------------------------
Language files blank comment code
---------------------------------------------------------------------------------------
PO File
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 9 18 33
XQuery
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 1 1
Drools
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 7 16 28
TTCN
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 11 16 19
Elixir
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 3 10 7
Visual Basic
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 4 2 6
Fortran 90
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 1 5 7
Windows Module Definition
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 1 1 18
Clean
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 10 30 58
Freemarker Template
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 2 27
Nim
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 5 13 43
Assembly
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 40 110 197
Mustache
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 5 7 31
Fortran 77
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 1 8 7
Ruby
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 11 30 111
Stata
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 7 7 22
make
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 4 85 157 242
Pascal
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 4 4 15 18
F#
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 3 6 14
Logtalk
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 59 57 368
Antlr
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 48 19 257
Python
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 7 18 4
COBOL
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 3 5 8 35
PL/I
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 7 5
Haml
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 5 16 66
TypeScript
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 3 52 39 410
Qt Linguist
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 4 57
Glade
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 22 232
Mathematica
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 24 17 22
Slim
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 3 10
Swift
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 23 13 65
Lua
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 3 9 2
Markdown
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 220 26 2136
RobotFramework
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 9 5 35
C
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 4 105 59 339
Dockerfile
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 4 1 53
R
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 3 95 312 698
C/C++ Header
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 191 780 617
YAML
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 137 1 137 2807
C++
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 4 132 173 570
XSLT
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 0 4 19
JavaScript
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 4 0 0 4
PHP
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 11 13 26
Tcl/Tk
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 1 2 3
Pig Latin
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 19 40 15
Haxe
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 26 99 24
Windows Resource File
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 42 45 218
Windows Message File
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 89 9 348
Puppet
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 2 2 27
Verilog-SystemVerilog
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 4 20 62
Julia
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 3 11 4
Blade
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 10 5 22
DOS Batch
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 1 2 2
Forth
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 17 84 529
Specman e
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 4 12 31
Smalltalk
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 19 5 85
Racket
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 32 159 247
IDL
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 2 1
ECPP
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 26 34 116
xBase
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 9 1
MUMPS
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 2 1
Kotlin
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 3 9
MATLAB
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 0 1 50
Lisp
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 5 26 24
Razor
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 4 4
F# Script
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 1 2 8
GraphQL
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 1 2 14
Solidity
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 2 19
Haskell
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 4 23 26 35
MXML
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 23 5 74
TeX
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 29 21 155
Vuejs Component
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 10 2 85
Brainfuck
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 1 3 24
Bourne Shell
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 2 0 0 2
XML
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 2 3
Objective C
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 11 11 25
C#
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 3 8 7 23
JSON
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 0 22
Cucumber
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 3 2 28
GLSL
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 10 14 32
Java
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 6 15 9
Focus
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 1 2 1
BrightScript
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 3 19
INI
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 2 3 7
Groovy
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 0 2 17
ColdFusion
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 1 2 2
Mako
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 3 8 9
LFE
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 15 21 25
Perl
same 0 0 0 0
modified 0 0 0 0
added 1 778 1246 9946
removed 5 1324 2301 19250
Sass
same 0 0 0 0
modified 0 0 0 0
added 0 0 0 0
removed 1 14 0 43
---------------------------------------------------------------------------------------
SUM:
same 0 0 0 0
modified 0 0 0 0
added 1 778 1246 9946
removed 270 2987 5228 31480
---------------------------------------------------------------------------------------
You should be able to reproduce this easily on your side by operating on this repository. (FTR, I'm running bash
in a generic gnome terminal in Ubuntu.)
BTW I had a random thought this morning. Right now, you've overloaded the parameters so that my git hashes could be directories or files, etc. If it helps, I would not be adverse to having a specific flag that provides explicit semantics. I.e., if I provided a flag that says "interpret these as git commits", I would imagine that would give us full support for branch names and tags. Yes?
Yeah, this is getting trickier the deeper I get. The output of cloc --diff d846f2a cb146d
you showed above is clearly wrong--what I was missing was that we just want the diffs of the files which were changed/added/removed relative to the two commits. What I was originally doing was comparing the file list of what changed in the first one (d846f2a) against the file list of what changed in the second (cb146d).
Anyway the latest commit cb146d applies this relative diff bit also introduces a new problem: git archive produces a zero sized tar file if asked to save a file which isn't in its inventory at a particular commit (ie, the file was added by the other commit).
The solution to this, as well as the csh problem, is to make Perl handle the intermediate steps, that is, break apart the git archive command into a bunch of smaller pieces. A drag.
This is representative of why I went looking for someone else who'd solved this problem for me. :) I didn't realize how ugly it was, I just felt it was uglier than I wanted to tackle. I'd pitch in, but I'm largely shell-script and perl ignorant. So, other than being a cheerleader on the sideline and a willing guinea pig, there's not a great deal I can do to help. But, again, thanks for doing this.
You're underestimating the value you've already provided. First off, it was a genius idea. Second, your volunteered role as guinea pig may get annoying as it will likely take a few more iterations before this works correctly.
During the week I generally have little time to work on cloc. This is one feature I want to get right though so things are going to move slowly for a while.
I'm glad I can serve.
I can certainly appreciate the whole work week issue. I also have repositories that don't get much love until the weekend.
6942a1e adds shell independence and smarter logic, but your example of cloc --diff d846f2a cb146d
still doesn't give the correct result.
The baffling thing is the underlying git archive
command doesn't produce the output I'm expecting. cloc tells git to make two tar files, one for each commit, something like
archive -o A.tar d846f2a cloc archive -o B.tar cb146d cloc
My expectation is that A.tar
will contain a copy of cloc
after commit d846f2a and B.tar
will have cloc
after commit cb146d. However the content of both tar files is identical! On the other hand, if I do
git --diff d846f2a cb146d
git clearly shows the file cloc
is different. Driving me nuts. What am I missing?
Two issues: first section deals with your assumption of what git archive
should give you. Second is what is being reported on diffs in general:
Git archive
I did a simple test:
master
.master
-> add_comment
.add_comment
).git archive -o master.tar master
git archive -o branch.tar add_comment
cloc --diff master.tar branch.tar
http://cloc.sourceforge.net v 1.60 T=0.01 s (80.9 files/s, 80.9 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
C++
same 0 0 0 5
modified 1 0 0 0
added 0 0 1 1
removed 0 0 0 0
-------------------------------------------------------------------------------
SUM:
same 0 0 0 5
modified 1 0 0 0
added 0 0 1 1
removed 0 0 0 0
-------------------------------------------------------------------------------
Correct results! One comment and one code line added. So, that clearly works.
If I do the same thing in the cloc repository:
git archive -o A.tar d846f2a cloc
git archive -o B.tar cb146d cloc
cloc --diff A.tar B.tar
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Perl
same 0 0 1246 9943
modified 1 0 0 1
added 0 0 2 2
removed 0 0 0 2
-------------------------------------------------------------------------------
SUM:
same 0 0 1246 9943
modified 1 0 0 1
added 0 0 2 2
removed 0 0 0 2
-------------------------------------------------------------------------------
git diff --stat d846f cb146
cloc | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Then I am getting results that are at least on the same order of magnitude (so, it's not counting all of the lines). So, I conclude one of two things:
1) If you're not getting the tarballs you expect, then you're not invoking what you believe, or 2) You're using the wrong criteria to judge the tarballs. They may be right, but what you're doing with them may be wrong. (And on that note...)
Diff evaluations Given the results listed just above, I'm not entirely sure what the mapping is, however. This is the actual diff:
--- a/cloc
+++ b/cloc
@@ -3669,7 +3669,9 @@ sub replace_git_hash_with_tarfile { # {{{1
next;
}
my ($Tarfh, $Tarfile) = tempfile(UNLINK => 1, SUFFIX => '.tar'); # dele
- my $cmd = "git diff --name-only $file_or_dir | tar cf $Tarfile -T -";
+# my $cmd = "git diff --name-only $file_or_dir | tar cf $Tarfile -T -";
+ # next line won't work on csh/tcsh
+ my $cmd = "git archive -o $Tarfile $file_or_dir \$(git diff --name-only
print $cmd, "\n";
system $cmd;
push @replacement_arg_list, $Tarfile;
We can see we modified one line (by adding the comment), and added two lines. So, I'm not entirely sure where the 2 "removed" lines being reported come from.
Thanks for the clear break-down, helped me see that cloc is actually doing the right thing now--without any more code changes:
> cloc --diff d846f2a cb146d git archive -o /tmp/6QAQU3XzBc.tar d846f2a cloc git archive -o /tmp/npWCGuARU7.tar cb146d cloc 1 text file. 1 text file. 0 files ignored. github.com/AlDanial/cloc v 1.73 T=1.52 s (0.7 files/s, 0.7 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- Perl same 0 0 1246 9945 modified 1 0 0 1 added 0 0 2 0 removed 0 0 0 0 ------------------------------------------------------------------------------- SUM: same 0 0 1246 9945 modified 1 0 0 1 added 0 0 2 0 removed 0 0 0 0 -------------------------------------------------------------------------------
Probably too many hours staring at the screen yesterday made me partially blind to what I was trying to see. Today, with fresh eyes, all looks good.
At this point I should describe (and add to the README.md after this feature goes into production), the logic cloc employs if it gets what looks like a commit hash as an input:
--diff
, it will do a straight count of every file in the repo at that commit hash.--diff
and a file and a git hash, it will diff the file against the same-named file in the git repo. All other repo files are ignored.--diff
and a directory and a git hash, it will diff the directory contents against all the files in the git repo.--diff
and two git hashes. cloc will first make a list of files that changed at the respective commit hashes (via git diff-tree --no-commit-id --name-only HASH
), then extract and only diff those files.There's a lot going on so I'd be grateful for more tire-kicking.
I was able to confirm that I'm getting the same results as you on the example above.
However, when I try it on my own repository, I get something completely different. My git diff --stat
reports 8 files changed. When I run cloc on those same shas, it counts up thousands of files. The final report counts hundreds of thousands of lines of code and comments that are the same. But when it comes to counting modified and added, the numbers are on the right scale:
cloc (blank + comment + code = total)
git diff:
The difference is largely due to a) one file included in git excluded in cloc (an unrecognized type), and b) some accounting issues. Other than that, the file counts are correct.
I'd understood that it wouldn't consider/count/crunch through files that weren't different between the two commits. Is that mistaken?
Details on what happens with --difff
on two hashes:
First it makes file listings from each hash for both full and the diff versions via (this is pseudocode rather than Perl):
Left_Full_List = git ls-tree --name-only -r LEFT_HASH Right_Full_List = git ls-tree --name-only -r RIGHT_HASH Left_Diff_List = git diff-tree --no-commit-id --name-only LEFT_HASH Right_Diff_List = git diff-tree --no-commit-id --name-only RIGHT_HASH
Next it makes a list which is a union of both diffs
Both_List =union( Left_Diff_List, Right_Diff_List)
If files have been added or deleted between the two commits, it is possible that Both_List
will include one or more file names that don't exist in one of the two repos. cloc then makes trimmed-down versions of Both_List
that contain only files that actually exist at that commit level.
Both_List_Left = intersection(Both_List, Left_Full_List) Both_List_Right = intersection(Both_List, Right_Full_List)
Next, it makes tar files of the left and right hashes using the files in the unioned and trimmed lists:
git archive -o Left_tar_file LEFT_HASH Both_List_Left git archive -o Right_tar_file RIGHT_HASH Both_List_Right
Finally, it does
cloc --diff Left_tar_file Right_tar_file
If you manually walk through the git commands with your hashes you should be able to duplicate the cloc results. Of course, if the logic is flawed I want to know.
Sorry for my slow response.
I have some concerns about git diff-tree
.
-r
flag. Without the recursion flag, I'm not getting any files that aren't in the root directory of the repository. That said, I tried adding it to cloc
on line 3704 and it made no difference to the final output. Hmmm....
-r
flag is the name of a directory in the project root that contains changes.*_Full_List
and Both_List
we end up with an empty list. So, when we provide that empty list as the PATH
argument to git archive
we are, in fact, not specifying individual files; we're implicitly saying grab it all.git diff-tree HEAD
and git diff-tree HEAD HEAD~1
should produce exactly the same results. I think what you really want is git diff-tree LEFT_HASH RIGHT_HASH
to get the differences between the two SHAs. This should eliminate the need for the union (as this should include additions, modifications, and removals).FTR I put in print statements at the calls to git_archive
to see the list of files being passed into the argument; both printed as empty lists.
I found the cause of the empty file lists--I'd neglected to trim newlines from one of the git inputs. In any event, your suggestion on diff-tree LHASH RHASH is good and I implemented that in my latest commit. The old logic is commented out.
Additionally, if you run with -v
it will now print out all the git commands it issues under the hood.
Let me know if these changes bring any improvement to your runs.
Every day leads to an improvement. I have a couple notes:
git-diff
invocation. You need to pass the -r
flag. I modified line 3735 to: `my $git_list_cmd = "git diff-tree -r --no-commit-id --name-only $Left $Right"; and my output improved immensely.For example, for my own particular HEAD
/master
comparison:
git diff --stat master HEAD
reports 5 files, 373 insertions and 12 deletions.git diff-tree --no-commit-id --name-only master HEAD
lists a single directory (no files).git diff-tree --no-commit-id --name-only -r master HEAD
lists exactly five files. (The same five reported by git diff
.The final output of cloc
is even more compelling:
-r
flag, the final table reported on 2862 files. (And, of course hundreds of thousands of "same" lines).-r
flag, it reported on just five files. And reported 4 lines modified, 358 lines added, 4 modified, and 8 removed. Obviously there are some differences in what is being counted by git
and cloc
and I'm not too worried about that. But the -r
flag makes all the difference.git archive
command is still printing verbose. line 3799d2b57f7 corrects both issues. I'm pleased the new capability is coming together. I'll release the next stable version, 1.74, once this looks solid.
So, I'm hammering on this and I have to come back to a fleeting comment.
You're very admirably trying to infer git operands on the --diff
processing option. It works well with something like:
cloc --diff master HEAD
However, if I want to compare arbitrary branches, it does not work so well, e.g.,
cloc --diff branch1 branch2
Which would produce an error like:
0 text files.
0 text files.
0 files ignored.
2 errors:
Unable to read: branch1
Unable to read: branch2
Nothing to count.
Life may be a lot easier if you go from inference to declaration and actually add a --force-git
flag that allows me to say, indisputably, that I expect these arguments to be interpreted as git hashes.
I've submitted a pull request (#215) illustrating how I envision the flag. Without looking at the big picture, I know that:
cloc --diff --git branchA branchB
now does what I want.
Thanks for the PR; makes sense. The modified subroutine happens early on so --git
is acted upon before cloc decides to do a straight count or a diff, which is good.
I'm ready to close this issue if you think it is fully baked.
My only hesitation in the PR was if the documentation was sufficient. I hadn't tested it w.r.t. other contexts in which the intput could/should be interepreted as a git hash. At the very least, the documentation could be tweaked to indicate that it should only be used in conjunction with --diff
.
I plan to release 1.74 this Friday, Sept 8. If you have any more tweaks or improvements for the new git capabilities let me know.
If you're content, I'm content. I certainly have it working for me. :) Shall we close the issue?
Version 1.74 was released with these new git features.
I'm liking this tool, and the
--diff
feature is nice as well. However, the diff I'd really like to operate on is the diff between two git commits. Is there some secret way to pipe a diff directly into cloc or to provide to git commits SHAs as the basis for the computation?