chaoss / grimoirelab-graal

A Generic Repository AnALyzer
GNU General Public License v3.0
21 stars 62 forks source link

[backend] Add CoLang Backend and Linguist Analyzer #19

Closed inishchith closed 5 years ago

inishchith commented 5 years ago

As per the discussion in this thread, I've added CoLang Backend along with the Linguist Analyzer.

There's still some work to be done ( tests, cloc analyzer, docs.. ), will be addressed in separate issue tickets.

@valeriocos Please let me know if any changes required :)

Thanks.

valeriocos commented 5 years ago

Thank you @inishchith for working on this. Overall the code looks ok, I would suggest to keep working on this PR to add tests and update the .travis.yml and documentation. The part concerning cloc could be integrated in another PR. What do you think ?

inishchith commented 5 years ago

Sure @valeriocos, I'll ping once I've worked on the necessary updates. About integrating cloc, I'd love to work on it, but as discussed here we can keep an issue ticket open for work with description. Let me know what you think.

Thanks :)

valeriocos commented 5 years ago

Great @inishchith, it sounds perfect!

Once the colang backend is ready, cloc can be easily integrated as a new category (as similarly done for the colic backend). I would prefer to keep the initial implementation of colang separated from the addition of cloc, just to keep the development focused on a specific task (and avoid charging you with too much work at the same time :)).

inishchith commented 5 years ago

@valeriocos I've made the necessary changes.

The coverage has dropped, I feel it's due to the following lines were not covered.

Screenshot 2019-04-14 at 2 15 29 PM

Please review and let me know if any changes required :)

inishchith commented 5 years ago

@valeriocos Ping

valeriocos commented 5 years ago

Thank you @inishchith for the remainder. Overall the PR looks OK, but i cannot check it properly until next Saturday (I don't have access to a laptop).

If you want, in the meanwhile, you can add the support for the other category (cloc). WRT the decrease of the test coverage, have you run it locally and check if some new lines are not covered ?

Sorry for the inconvenience

inishchith commented 5 years ago

Thank you @inishchith for the remainder. Overall the PR looks OK, but i cannot check it properly until next Saturday (I don't have access to a laptop).

Sure, no worries

If you want, in the meanwhile, you can add the support for the other category (cloc).

Sure, I'll start working on it.

WRT the decrease of the test coverage, have you run it locally and check if some new lines are not covered?

I'd checked it, as I've mentioned in the earlier comment ( with the screenshot ). I'll try and fix the issue soon. Thanks

inishchith commented 5 years ago

As i was working on integrating cloc analyzer with the CoLang Backend, i encountered that cloc as of now is used to analyze files individually ( but the CoLang backend requires analysis over a snapshot of commit, which is possible to achieve via cloc but would require some alteration to the current implementation ). I was thinking to pass in an extra argument backend to the analyzer which would allow us to handle both the backends CoLang and CoCom and their corresponding requirements with no hindrance.

@valeriocos Let me know what you think :) Thanks.

valeriocos commented 5 years ago

Sorry for the late reply @inishchith

I would alter the current implementation to avoid coupling cloc to the colang and cocom backends. For instance, we could add a param to the analyze method or when initializing the class to process a single file or all the repo. what do you think @inishchith ?

Thanks :)

inishchith commented 5 years ago

@valeriocos Thanks for the response. Yes, that's what I was trying to suggest in the earlier comment. I'd pass in an argument backend to the analyze method in order to handle the backends separately.

Please let me know if I got something wrong.

valeriocos commented 5 years ago

Thank you for the quick reply. Instead of using the argument 'backend', i would use a boolean param (e.g., repo=False), which allows to run cloc on the full repo if True. The param 'backend' would probably couple cloc to the two backends and limit future extensions (e.g., another backend using cloc)

What do you think?

inishchith commented 5 years ago

Thank you for the quick reply. Instead of using the argument 'backend', i would use a boolean param (e.g., repo=False), which allows to run cloc on the full repo if True. The param 'backend' would probably couple cloc to the two backends and limit future extensions (e.g., another backend using cloc)

What do you think?

Yes. This would be much better. I'll work on this and update the thread once done. Thanks for the quick response @valeriocos :)

valeriocos commented 5 years ago

@inishchith please let me know when I can review the PR, thanks :)

inishchith commented 5 years ago

@valeriocos Sorry for the delayed response. Yes, Please do review and let me know if any changes required :) I've updated the PR with:

Thanks

valeriocos commented 5 years ago

@inishchith I'll check the PR today, thanks!

inishchith commented 5 years ago

@inishchith for this PR. Overall it looks OK, I left some minor comments. The main effort is probably about reorganizing the commits, for instance:

  • code + tests for linguist
  • code + tests for cloc
  • code + tests + doc for the backend

Thank you!

@valeriocos Thanks for the review. I've reordered the commits and made the requested changes. Let me know what you think ;)

inishchith commented 5 years ago

Sorry for the delayed response. @valeriocos Thanks for the suggestions. I understand that maintaining good documentation is necessary and will make sure to improve on it from next time :) I've made the necessary changes, let me know what you think

Thanks

valeriocos commented 5 years ago

No worries! :)

I made a test to compare the results of cloc and linguist, and it seems that there is a tiny error when executing colang with cloc, ext should be present only if a file is analyzed.

{
    "backend_name": "CoLang",
    "backend_version": "0.1.0",
    "category": "code_language_cloc",
    "data": {
        "Author": "Santiago Due\u00f1as <sduenas@bitergia.com>",
        "AuthorDate": "Tue Aug 18 18:54:45 2015 +0200",
        "Commit": "Santiago Due\u00f1as <sduenas@bitergia.com>",
        "CommitDate": "Tue Aug 18 18:54:45 2015 +0200",
        "analysis": {
            "Python": {
                "blanks": 19,
                "comments": 46,
                "loc": 40,
                "total_files": 2
            },
            "ext": "/tmp/worktrees/test2" <---------
        },
        "analyzer": "cloc",
        "commit": "2355d18310d8e15c8e5d44f688d757df33b0e4be",
        "message": "[perceval] Add perceval script"
    },
    "graal_version": "0.1.0",
    "origin": "https://github.com/chaoss/grimoirelab-perceval",
    "tag": "https://github.com/chaoss/grimoirelab-perceval",
    "timestamp": 1556035300.284228,
    "updated_on": 1439916885.0,
    "uuid": "903be190a192081ad8105d8e7774684e076704e3"
}

Another minor thing (that we may leave for the future) is the result of Linguist, in particular I'm not sure whether it's better to move the content of language_distribution to analysis. What do you think ?

{
    "backend_name": "CoLang",
    "backend_version": "0.1.0",
    "category": "code_language_linguist",
    "data": {
        "Author": "Santiago Due\u00f1as <sduenas@bitergia.com>",
        "AuthorDate": "Tue Aug 18 18:11:40 2015 +0200",
        "Commit": "Santiago Due\u00f1as <sduenas@bitergia.com>",
        "CommitDate": "Tue Aug 18 18:11:40 2015 +0200",
        "analysis": {
            "language_distribution": { <--------
                "Python": 100.0
            }
        },
        "analyzer": "linguist",
        "commit": "57bc204822832a6c23ac7883e5392f4da6f4ca37",
        "message": "[tests] Set up unit tests framework"
    },
    "graal_version": "0.1.0",
    "origin": "https://github.com/chaoss/grimoirelab-perceval",
    "tag": "https://github.com/chaoss/grimoirelab-perceval",
    "timestamp": 1556035274.727083,
    "updated_on": 1439914300.0,
    "uuid": "8ef1cdec050d8cd54473e1046a33c862216127cf"
}
inishchith commented 5 years ago

@valeriocos

I made a test to compare the results of cloc and linguist, and it seems that there is a tiny error when executing colang with cloc, ext should be present only if a file is analyzed.

{
    "backend_name": "CoLang",
    "backend_version": "0.1.0",
    "category": "code_language_cloc",
    "data": {
        "Author": "Santiago Due\u00f1as <sduenas@bitergia.com>",
        "AuthorDate": "Tue Aug 18 18:54:45 2015 +0200",
        "Commit": "Santiago Due\u00f1as <sduenas@bitergia.com>",
        "CommitDate": "Tue Aug 18 18:54:45 2015 +0200",
        "analysis": {
            "Python": {
                "blanks": 19,
                "comments": 46,
                "loc": 40,
                "total_files": 2
            },
            "ext": "/tmp/worktrees/test2" <---------

Apologies, I failed to notice this. Will make sure to correct this in next push.


Another minor thing (that we may leave for the future) is the result of Linguist, in particular I'm not sure whether it's better to move the content of language_distribution to analysis. What do you think ?

{
"backend_name": "CoLang",
"backend_version": "0.1.0",
"category": "code_language_linguist",
"data": {
"Author": "Santiago Due\u00f1as <sduenas@bitergia.com>",
"AuthorDate": "Tue Aug 18 18:11:40 2015 +0200",
"Commit": "Santiago Due\u00f1as <sduenas@bitergia.com>",
"CommitDate": "Tue Aug 18 18:11:40 2015 +0200",
"analysis": {
"language_distribution": { <--------
"Python": 100.0
}
},
"analyzer": "linguist",
"commit": "57bc204822832a6c23ac7883e5392f4da6f4ca37",
"message": "[tests] Set up unit tests framework"
},
"graal_version": "0.1.0",
"origin": "https://github.com/chaoss/grimoirelab-perceval",
"tag": "https://github.com/chaoss/grimoirelab-perceval",
"timestamp": 1556035274.727083,
"updated_on": 1439914300.0,
"uuid": "8ef1cdec050d8cd54473e1046a33c862216127cf"
}

I kept in the sense of making the attribute a bit more understandable. If we prefer consistency, i would go ahead with your suggestion as it is much similar to the one produced by cloc analyzer. I'd prefer keeping the current structure as it's gives more sense to the results retrievedLet me know what you think :)

(example)

 "analysis": {
                "Python": 100.0,
                "JavaScript": 60.04
                 ....
        },
valeriocos commented 5 years ago

Thank you for the quick reply @inishchith

Apologies, I failed to notice this. Will make sure to correct this in next push. No worries :)

I kept in the sense of making the attribute a bit more understandable. If we prefer consistency, i ... Let's go for consistency

Thank you!

inishchith commented 5 years ago

@valeriocos Thanks for the response. I'll update the PR :)

inishchith commented 5 years ago

@valeriocos I had separated the changes in new commits so that in case we revert in future, we have a log. I've tested this locally. Please review when you get time and let me know if any changes required :)

Thanks

valeriocos commented 5 years ago

Sorry for the late reply. Since the changes are not that big, it's probably better to keep having a commit per file.

Thanks :)

inishchith commented 5 years ago

@valeriocos Made the required changes. Please do have a look.

Thanks :)

valeriocos commented 5 years ago

Great job @inishchith , just merged, thanks!

inishchith commented 5 years ago

@valeriocos Thanks for the merge. While working on this task, i've noticed a few more improvements which i'd be adding into another issue ticket. Any pointers from your side on moving ahead?

valeriocos commented 5 years ago

You're welcome.

Yes, there is a tiny task to align what has been recently added to Perceval: https://github.com/chaoss/grimoirelab-perceval/pull/513

Feel free to open issues for the improvements you see, thanks!

inishchith commented 5 years ago

@valeriocos thanks for the suggestion. I'll open PR once i've worked on the improvement in alignment to recent addition to Perceval, in case i need some suggestion i'll open an issue ticket and we can discuss it over there :)