Closed GoogleCodeExporter closed 9 years ago
Hi Julien.
That would be interesting; yes (and quite useful). However, I'm not quite sure
how to implement it in a general (and working) way.
Maybe the easiest way would be to simply try to track how many lines a
developer has attributed to him, during different times, in the commit history
by checking a file multiple times with git blame or perhaps by using git blame
--reverse.
I have a feeling that it will take quite a long time to analyze.
Using commit messages to track if a commit is a fix is not really an option
considering all the different languages people use.
The most general solution would probably be to do a "Code stability" value for
each developer that simply tells us how stable their code is (how much of it
has survived). The average code stability of all developers would also be the
general code stability of the project. In any case; a statistic such as this
one also fits nicely into the original use case of gitinspector; the grading of
student projects.
/Adam Waldenberg
Original comment by gitinspe...@ejwa.se
on 20 Jul 2013 at 8:31
Considering this is a good idea; something like this will be implemented
eventually (as discussed by my previous post). However; there are other things
that are of higher priority right now.
/Adam Waldenberg
Original comment by gitinspe...@ejwa.se
on 24 Jul 2013 at 7:47
I think a first implementation could be similar to that:
for each commit in history
for each added line
push in a <author, file line, count>map the added line with count = 1
for each modified or removed line
search in all authors the modified line
increment that line count with 1
if the modified line was not associated with the current commit author, then add an entry in the map
Of course there are some subtleties to take into account (such as adding a new
line should bump the line index of all the following lines in the file)...
But I would be curious to know if something as simple as that brings some
useful stats.
Original comment by julien.f...@kitware.com
on 24 Jul 2013 at 12:30
That's probably not quite how I would do it. You don't really need to track the
line-count either.
In any case, your implementation would probably push out something; but it
wouldn't be correct; because we would be doing it manually without the use of
the algorithm used in git blame. Meaning the number of attributed lines would
be out of sync with what we would get from the above. An attributed line is not
(necessarily) the same thing as an inserted/added line.
I think a similar method but by using git blame on the changed lines (to check
who it really belongs to) would give us something more meaningful.
It's doable; but will be a little hairy to implement. But something like this
would probably work better:
for each file that ever existed in repo:
get blame history for every commit of file
for each commit of file
for each added line
check in blame history to which author the line should be added and do;
attribbuted_lines += 1
for each removed line
check in blame history to which author the removed line was attributed to and do;
removed_lines += 1
In the end you get a stability value on each author that tells us how stable
their code is throughout the git history.
/Adam Waldenberg
Original comment by gitinspe...@ejwa.se
on 24 Jul 2013 at 2:11
I'm targeting this for the 0.4.0 release. It will get implemented then or in
some 0.3.x release up to 0.4.0.
Original comment by gitinspe...@ejwa.se
on 25 Jul 2013 at 11:07
This turned out to be a little more tricky than expected. Work on this issue is
slowly progressing.
/Adam Waldenberg
Original comment by gitinspe...@ejwa.se
on 7 Aug 2013 at 12:38
I am also considering if the code stability value should take in to account not
only the rows that have survived but also *how long* they have survived before
they are removed... How this should be calculated (and how to factor this in,)
is a very interesting problem.
/Adam Waldenberg
Original comment by gitinspe...@ejwa.se
on 7 Aug 2013 at 12:43
Great idea !
Maybe that info can't be factored out with the "how many" (number of rows
modified) stats. It might be better to keep those stats separated, and produce
different graphs to display them instead. I would keep the *how long* in
number of days.
Original comment by julien.f...@kitware.com
on 7 Aug 2013 at 12:28
This issue was closed by revision 16154cd0ba94.
Original comment by gitinspe...@ejwa.se
on 27 Jan 2014 at 2:12
After playing around with this for a while (and trying different solutions) I
managed to implement something that does not slow down analysis in any
detectable way.
There are now two additional values in the blame output; stability and age. See
the commit referenced above for more information. While the solution is a lot
more naive than the ones previously discussed; it still gives good information
on authors in relation to each other.
/Adam Waldenberg
Original comment by gitinspe...@ejwa.se
on 27 Jan 2014 at 2:19
That's a very nice feature. Nice job !
I've tried it on CTK(https://github.com/Commontk/CTK) and I seem to have some
odd results (most of the results look fine otherwise), Stability is >100 and
Age (in month?) > lifetime of the project.
Author Rows Stability Age % in comments
Stability >100:
Luis Ibanez 83 8300.0 40.25 9.64
ivmartel 811 159.0 6.17 41.06
ivowolf 3344 314.9 1.83 19.47
Age > lifetime:
Marco Nolden 9107 56.5 357.58 19.49
Original comment by julien.f...@kitware.com
on 27 Jan 2014 at 1:27
Yep. I noticed this when trying it on some other repos as well.
The age value is a pseudo value (for now) and only makes any sense when
compared to other authors. It could be a good idea to redo to show months or
weeks (when the -w flag is given).
When it comes to the stability value, 8300% on 83 rows (Luis) means that Luis
only has one inserted row but has 83 rows blamed to him (probably duplicates of
some kind). I guess git is getting a little confused here. I will investigate
and update this issue when I find a good solution.
/Adam Waldenberg
Original comment by gitinspe...@ejwa.se
on 27 Jan 2014 at 10:52
[deleted comment]
The age value has been improved with revision a1e90d0a9d46 and now shows the
age of the authors rows in months (or weeks).
I think that the strange stability value reported is due to git sometimes
loosing some information (and counting too few insertions) whenever new files
are added (or old files are moved). At least that is what I suspect. I have
some ideas on how to get around it and will work on it eventually.
/Adam Waldenberg
Original comment by gitinspe...@ejwa.se
on 14 Feb 2014 at 4:34
Only the age value will be included in the 0.4.0 release. I', bumping up proper
support for the stability value to 0.4.1. There are a few things that need to
be changed/reworked before it can be included in a way that makes me completely
satisfied with it.
/Adam Waldenberg
Original comment by gitinspe...@ejwa.se
on 17 Mar 2014 at 7:20
I'm considering this "completed".
After some investigation it is evident that a stability value over 100% means
that someone else has added code that has been attributed to somebody else ...
Consequently... The author making the change get's the insertion but no the
blame... Resulting in a raised stability for the original author...
Seems correct to me.
-w / --grading will result in age values being displayed in weeks, instead of
months.
Original comment by gitinspe...@ejwa.se
on 3 Nov 2014 at 10:15
Original issue reported on code.google.com by
julien.f...@kitware.com
on 19 Jul 2013 at 1:19