chaoss / grimoirelab-graal

A Generic Repository AnALyzer
GNU General Public License v3.0
21 stars 62 forks source link

[cocom] Add repository level analysis via lizard #39

Closed inishchith closed 5 years ago

inishchith commented 5 years ago

A repository level analysis implementation for CoCom Backend via Lizard. This implementation doesn't harm the incremental fetches (which was one of the issues with implementation proposed in #38)

Evaluation:

Repository Number of Commits *File Level Repository Level
[chaoss/grimoirelab-perceval]() 1387 23.65 min 27.97 min
[chaoss/grimoirelab-sirmordred]() 869 9.69 min 4.27 min
[chaoss/grimoirelab-graal]() 169 1.73 min 0.90 min

@valeriocos Please do have a look when you get time. Thanks :)

WIP

Closes #36

Signed-off-by: inishchith inishchith@gmail.com

inishchith commented 5 years ago

@valeriocos after considering only the master branch, following are the results.

Repository Number of Commits *File Level Repository Level
[chaoss/grimoirelab-perceval]() 1394 26:22 min 26:56 min
[chaoss/grimoirelab-sirmordred]() 869 08:51 min 3:51 min
[chaoss/grimoirelab-graal]() 171 2:24 min 1:04 min

Also, please let me know what you think of the changes. Thanks :)

inishchith commented 5 years ago

@valeriocos I had a question. Currently, I haven't incorporated cloc related analysis data ( blanks, comments ). I was thinking to execute it at file_level and add the information here. If you agree, I can evaluate the time after making the changes and report it here.

Let me know what you think :)

valeriocos commented 5 years ago

Good point @inishchith. Yes, please try to integrate it cloc info, thanks!

inishchith commented 5 years ago

@valeriocos Thanks. Done!. Can you have a look again?

inishchith commented 5 years ago

@valeriocos Addition of cloc at file level has slowed down the analysis.

Evaluation:

Repository Number of Commits *File Level Repository Level
[chaoss/grimoirelab-perceval]() 1394 --- min --- min
[chaoss/grimoirelab-graal]() 171 2:22 min 28:35 min

I can think of two ways now:

  1. Incorporate current implementation and then evaluate, optimize current implementation during the later phases(3rd Coding Phase) as discussed, which helps us with the entire data for metrics visualization. ( preferred )
  2. (Or) Leave out comments and blanks for now ( for the time of execution issues ) and then incorporate it in the later phase(3rd Coding Phase). Let me know what you think :)
valeriocos commented 5 years ago

Incorporate current implementation and then evaluate, optimize current implementation during the later phases(3rd Coding Phase) as discussed, which helps us with the entire data for metrics visualization. ( preferred )

+1! thanks!

inishchith commented 5 years ago

@valeriocos Thanks for the review. I've worked on the changes addressed above. Please have a look and let me know if any more changes required.

Thanks!

valeriocos commented 5 years ago

We could add to this PR the code to identify files modified in a commit when executing the analysis at repo-level, what do you think?

The change should be so big, it would be just a matter of passing the commit (or the list of files in the commit), and then mark the files in the result of the analysis. See an attempt below:

slimbook@slimbook-KATANA:~/Escritorio/sources/graal$ git diff
diff --git a/graal/backends/core/analyzers/lizard.py b/graal/backends/core/analyzers/lizard.py
index d7d210e..0ee828e 100644
--- a/graal/backends/core/analyzers/lizard.py
+++ b/graal/backends/core/analyzers/lizard.py
@@ -89,7 +89,7 @@ class Lizard(Analyzer):
         result['funs'] = funs_data
         return result

-    def __analyze_repository(self, repository_path, details):
+    def __analyze_repository(self, repository_path, commit, details):
         """Add code complexity information for a given repository
         using Lizard and CLOC.

@@ -112,12 +112,14 @@ class Lizard(Analyzer):

         for analysis in repository_analysis:
             cloc_analysis = cloc.analyze(file_path=analysis.filename)
+            file_path = analysis.filename.replace(repository_path, '')
             result = {
                 'loc': analysis.nloc,
                 'ccn': analysis.CCN,
                 'tokens': analysis.token_count,
                 'num_funs': len(analysis.function_list),
-                'file_path': analysis.filename,
+                'file_path': file_path,
+                'in_commit': True, # check file in commit.files.file
                 'blanks': cloc_analysis['blanks'],
                 'comments': cloc_analysis['comments']
             }
@@ -140,7 +142,8 @@ class Lizard(Analyzer):
         details = kwargs['details']

         if kwargs.get('repository_level', False):
-            result = self.__analyze_repository(kwargs["repository_path"], details)
+            commit = kwargs['commit']
+            result = self.__analyze_repository(kwargs["repository_path"], commit, details)
         else:
             result = self.__analyze_file(kwargs['file_path'], details)

diff --git a/graal/backends/core/cocom.py b/graal/backends/core/cocom.py
index 39e1681..8a1aa68 100644
--- a/graal/backends/core/cocom.py
+++ b/graal/backends/core/cocom.py
@@ -190,7 +190,7 @@ class CoCom(Graal):
                 file_info.update({'file_path': file_path})
                 analysis.append(file_info)
         else:
-            analysis = self.analyzer.analyze(self.worktreepath)
+            analysis = self.analyzer.analyze(self.worktreepath, commit)
         return analysis

     def _post(self, commit):
@@ -261,7 +261,7 @@ class RepositoryAnalyzer:
         self.details = details
         self.lizard = Lizard()

-    def analyze(self, repository_path):
+    def analyze(self, repository_path, commit):
         """Analyze the content of a repository using CLOC and Lizard.

         :param repository_path: repository path
@@ -282,6 +282,7 @@ class RepositoryAnalyzer:
         kwargs = {
             'repository_path': repository_path,
             'repository_level': True,
+            'commit': commit,
             'details': self.details
         }
         lizard_analysis = self.lizard.analyze(**kwargs)
inishchith commented 5 years ago

@valeriocos The changes look good to me and hopefully we can use this information to provide some insights at file-level as mentioned at https://github.com/inishchith/gsoc-graal/issues/11#issuecomment-505517978

I shall update the PR now. Thanks!