adamtornhill / code-maat

A command line tool to mine and analyze data from version-control systems
http://www.adamtornhill.com/code/codemaat.htm
2.38k stars 222 forks source link

Unable to justify results #65

Closed varunachar closed 4 years ago

varunachar commented 4 years ago

Hi,

First of all, thanks for creating this tool and putting in the effort and research to help bring some insight into analysing technical debts.

I'm trying to use code-maat to understand our codebases and i'm seeing very weird results which making me question whether I should use the results of this tool at all.

I ran this tool on a large codebase within my company and the results seem conflicting:

git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --no-renames --after=2019-03-01 -- . ":(exclude).staging/" ":(exclude).tmp_staging/" ":(exclude)tests/" ":(exclude)k8s/" ":(exclude)node_modules/" > logfile.log

Main Dev

Command:

docker run -v ~/Work/code-analysis/:/data -it code-maat-app -l /data/logfile.log -c git2 -a main-dev > main-dev.csv

Tagged over 2000 files to a single dev - with added=0, total-added=0 and ownsership=0. This means that no changes were made to these files? This dev has made no changes to the files at all. Verified this by looking at Github history.

Main Dev By Rev

Command:

docker run -v ~/Work/code-analysis/:/data -it code-maat-app -l /data/logfile.log -c git2 -a main-dev-by-revs > main-dev-by-revs.csv

Tagged over 2000 files to a single dev - This dev has made no changes to the files at all. Verified this by looking at Github history

Age

Command:

docker run -v ~/Work/code-analysis/:/data -it code-maat-app -l /data/logfile.log -c git2 -a age > age.csv

Result - Off the 5k files analysed, 4.7k files have been modified in the last 2 month. This is conflicting with the results under Main Dev. Also, I checked file history, some files tagged with age as 2 months has not been modified since 2018.

Is this because it looks at entire git history of all branches vs only master? How do I make it run against only master?

adamtornhill commented 4 years ago

@varunachar Thanks for your kind words!

code-maat operates on whatever Git logs you provide. The git log command I have in the README indeed mines commits from all branches. To run against the current branch you just remove the --all option. That's one likely explanation for the deviation.

Another possible explanation is that the files themselves haven't been changed, but their containing folders have been renamed. This is recorded as a modification in Git, and code-maat doesn't do any rename tracking (the evolution of code-maat, CodeScene, performs full rename tracking and automates the analysis to resolve such biases).

adamtornhill commented 4 years ago

@varunachar I just wanted to check if I managed to clarify how it works, and that you managed to track down the potential bias?

varunachar commented 4 years ago

I tried it out without the --all flag and that gave better results.


From: Adam Tornhill notifications@github.com Sent: Monday, December 30, 2019 12:57:10 AM To: adamtornhill/code-maat code-maat@noreply.github.com Cc: Varun Achar achar_varun@hotmail.com; Mention mention@noreply.github.com Subject: Re: [adamtornhill/code-maat] Unable to justify results (#65)

@varunacharhttps://github.com/varunachar I just wanted to check if I managed to clarify how it works, and that you managed to track down the potential bias?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/adamtornhill/code-maat/issues/65?email_source=notifications&email_token=AAG375EWQ3WKYSIFIIHOSTTQ3D2Y5A5CNFSM4J5IH6U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHZGRNI#issuecomment-569534645, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAG375EM2LZGJPIFQ3V26LTQ3D2Y5ANCNFSM4J5IH6UQ.