Closed inishchith closed 5 years ago
@valeriocos Sorry for the delayed response.
Instead of passing a flag to trigger the analysis at repo level, it wouldn't be better to introduce a new category that performs this kind of analysis? What do you think?
The variable
self.history
collects the cocom analysis for each file. The solution seems to work well when performing the initial fetch, but I'm not sure it would work for incremental fetches. Could you explain the logic in case I'm missing something?
A different approach could be to execute the analyzer over the full repo. for each commit and then sum up the results obtained? Maybe the param -t may speed up the analysis. What do you think?
time
and redundancy
in the calculation.
( it's like cutting down the history
part and then thinking of a workable solution ).
But Yes; agreed here, we face a trade-off between Time(full-repo on every commit) and Memory(maintaining self.history
). And as(pointed by you), if we can get incremental fetch working on the current implementation then it'd be great, else have to evaluate lizard
's full-repo with WORKING_THREADS
option. I'll update you once i've worked on the evaluation. Let me know what you think!
No worries @inishchith , thank you for answering.
We have used category as a convention to perform analysis using different analyzers as of now and repository-level analysis does make sense to other backends as well so thought it would be better to provide an option(flag) instead of a category. let me know what you think
If the data has a different shape it's probably better to use a different category. However, we can proceed without adding a new category, and change the code afterwards if needed :)
I'm not sure about the definition of the variable self.repository_level
in __init__
. That information seems to be more related to the way the fetch is performed than how the class is initialized. Could you explain why repository_level
has been defined as an instance attribute ?
Thanks
If the data has a different shape it's probably better to use a different category. However, we can proceed without adding a new category, and change the code afterwards if needed :)
I'm not sure about the definition of the variable self.repository_level in init. That information seems to be more related to the way the fetch is performed than how the class is initialized. Could you explain why repository_level has been defined as an instance attribute?
Edit:
@valeriocos I had a thought over the incremental fetches
issue regarding the current implementation. I figured out that we have to execute an initial run over the entire repository every-time that would again increase the execution time for large repositories as addressed in #36 .
As pointed by you above (about lizard's worker thread for repository-level analysis):
A different approach could be to execute the analyzer over the full repo for each commit and then sum up the results obtained ? Maybe the param -t may speed up the analysis ...
Here the incremental fetches
wouldn't be affected and would work just as before. I have implemented a version locally for evaluation purpose, (below are the results).
Repository | Number of Commits | *File Level | Repository Level |
---|---|---|---|
[chaoss/grimoirelab-perceval]() | 1387 | 23.65 min | 27.97 min |
[chaoss/grimoirelab-sirmordred]() | 869 | 9.69 min | 4.27 min |
[chaoss/grimoirelab-graal]() | 169 | 1.73 min | 0.90 min |
(there's a divergence due to Perceval have a lot more files than the other repositories in consideration). With the help of lizard's repository-level analysis, I was able to create two of the metric visualization ( Overall LOC and CCN and other attributes). REF. https://github.com/chaoss/metrics/issues/139. And I'm now on to visualizing the most complex files in a repository.
Let me know what you think. Thanks :)
Closing in reference to #39
@valeriocos As discussed, this adds repository level analysis as an option to CoCom Backend.
As of now, this is being added to make repository level analysis and visualization to be carried out in Kibana in a bit easier way. We can discuss the limitation that might be caused in the future and some edge cases that I might have missed in the implementation.
Results of comparison can be found in #36
Edit: This is just a rough idea (implementation)
Some things that need to be worked on: