ejwa / gitinspector

:bar_chart: The statistical analysis tool for git repositories
GNU General Public License v3.0
2.36k stars 327 forks source link

Add MATLAB/Octave extension and comment recognition #126

Open bonanza123 opened 7 years ago

bonanza123 commented 7 years ago

Adds MATLAB/Octave extension and comment recognition

adam-waldenberg commented 7 years ago

Hi. Thank you. Would you also be willing to add support for metrics part (metrics.py) ? We have support for a lot of languages, but most of them lack metrics support. I once was pretty proficient in matlab (we had it in a few math courses), but haven't really used it since.

To explain how it works;

metric_eloc = Specifies the recommended maximum number of lines for a certain file type. metric_cc_tokens = For each language, the first list is a list of regexp entry tokens (something that increases the depth of logic), the second list is a regexp list of exit tokens (something that decreases the depth).

bonanza123 commented 7 years ago

@adam-waldenberg I will have a look at it, but I guess this will take some time as I'm currently a bit busy.

One more question regarding the metrics: what does the r e.g. in

["else", r"for\s+\(.*\)", r"foreach\s+\(.*\)", r"goto\s+\w+:", r"if\s+\(.*\)", r"case\s+\w+:",
"default:", r"while\s+\(.*\)"],

mean? And how to handle cases like:

if ( 5 > 1 ) // comment

? Or are the comments stripped before the metrics are calculated?

adam-waldenberg commented 7 years ago

@bonanza123 That is the notation for "raw strings" in Python. The string becomes a raw string instead of a Python string. So you don't have to do for example "\" to do a single backslash.

r"if\s+(.*) should pick up you example (it ignores the rest).

You can use something like https://regex101.com/ to test.

Have fun ! :)

bonanza123 commented 7 years ago

@adam-waldenberg One more question: Does r"for\s+\(.*\)" also match r"parfor\s+\(.*\)" ?

It would also be nice if you could check my modifications, in particular the if and elseif part, as I'm not sure how the matching is done in the scripts.

adam-waldenberg commented 7 years ago

@bonanza123 Hi. Yes, it should see a parfor with that as well. They only immediate problem I see is that "elseif\s+(.*)" is defined as both an entry and exit token.

Otherwise it looks good.

bonanza123 commented 7 years ago

@adam-waldenberg thanks for your feedback. What do you recommend regarding the elseif ? In MATLAB it could look like:

if (x == 5 )
y=5;
elseif (x == 4)
y=34;
elseif ( x==3)
y=5;
else
y=0;
end

I assumed that if we just count from if to end, then we wouldn't consider how many ugly elseifs he/she added.

Alternatively, he/she could also write a lot of if, else, end which would maybe even worse.

adam-waldenberg commented 7 years ago

@bonanza123 The elseif's are still counted if they are included in the token list. You should probably also add "else". So in the above case gitinspector would/should count it the following way;

if (x == 5 ) % +2
    y=5;              
elseif (x == 4) % +2
    y=34;
elseif ( x==3) % +2
    y=5;
else % +2
    y=0;
end % +1

Strictly speaking, each point in the if/elseif/else list is of course an exit point, but we don't really care about that. The gitinspector way of counting complexity isn't "exact". However, it doesn't have to be, as it's mainly a way to just help everybody find badly designed files, not as an exact measurement of a metric.

adam-waldenberg commented 7 years ago

After some thought, I actually think you can remove "end" from the token list.

adam-waldenberg commented 7 years ago

@bonanza123

Looked a bit more at this... It doesn't seem complete. Does matlab require paranthesis around for loops and statements? I Seem to remember you can write similar to;

for a=1:1:10

Quite a few years since I did anything in Matlab.