chaoss / grimoirelab-graal

A Generic Repository AnALyzer
GNU General Public License v3.0
21 stars 62 forks source link

Add scancli option to CoLic Backend #27

Closed inishchith closed 5 years ago

inishchith commented 5 years ago

Adding support of a faster version of scancode ( scancli ) to CoLic Backend.

@valeriocos Please let me know if i can work on this. Thanks

valeriocos commented 5 years ago

Sure @inishchith , thanks ! You can find useful info at the following urls:

A possible implementation could add a boolean param cli here: https://github.com/chaoss/grimoirelab-graal/blob/master/graal/backends/core/analyzers/scancode.py#L41 and the method analyze could be modified to call two private methods: analyze_scancode and analyze_scancode_cli (depending on the value of cli), the former would contain the code of the current analyze method and the other some code similar to this one.

The code of the colic backend shouldn't probably changed too much (just adding new categories and related code).

What do you think ?

inishchith commented 5 years ago

@valeriocos Thanks for the supporting links and insights on how to go about the task. I'll start working on the task and open a PR once done, then we can have further discussion over there.

Thanks :)

inishchith commented 5 years ago

@valeriocos can you share the version of scancode release or the setup that you used in order to run scancli successfully? I read the discussion on https://github.com/nexB/scancode-toolkit/issues/1400 but couldn't reproduce the results as I ran into multiple errors, so thought of asking before moving forward.

Thanks

valeriocos commented 5 years ago

Sorry for the late reply @inishchith

In the virtual env used by graal, I installed simplejson and execnet as reported here: https://github.com/nexB/scancode-toolkit/commit/8afa686fb71b9540029234e5a40c0572c4457c28#diff-f826f8c8f6f35f368b2a692610f05d62R18

Then I used the following branch: https://github.com/valeriocos/grimoirelab-graal/tree/test-scancli/graal, and launched the backend in the following way:

colic
https://github.com/chaoss/grimoirelab-toolkit
--git-path
/tmp/xyzw
--exec-path
/home/scancode-toolkit/scancode (v3.0.0 downloaded from here: https://github.com/nexB/scancode-toolkit/releases
--category
code_license_scancode
--json

Note that you have to modify the method metadata to include the param filtered_classified

Tomorrow I can push a better version of the code of my branch.

Hope it helps :)

inishchith commented 5 years ago

@valeriocos Thanks for sharing the information.

Please do correct me here if I'm wrong or have missed something out. Thanks

valeriocos commented 5 years ago

Sorry @inishchith I made a mistake. It wasn't version 3.0.0, but the checkout at https://github.com/nexB/scancode-toolkit/commit/8afa686fb71b9540029234e5a40c0572c4457c28 (as reported here: https://github.com/nexB/scancode-toolkit/issues/1400#issuecomment-469713862). The code was then merged in the develop branch (as reported here: https://github.com/nexB/scancode-toolkit/issues/1400#issuecomment-470651652).

If you clone the repo and use the current develop branch, the backend should work (https://github.com/nexB/scancode-toolkit/tree/develop).

Let me know if you have any problem, thanks :)

inishchith commented 5 years ago

@valeriocos Sorry for the delayed response.

I tried reproducing the results using your setup information and the test-scancli branch of your fork. But I couldn't do it, I feel there has been some change to the implementation since. I've shared the error log. Please do let me know if you've encountered it before or i must have missed something out. Thanks :)

[2019-05-13 16:52:39,704] - Analysis failed at 9dc821962567715e5358b1192e1b15d8868d2b6c Traceback (most recent call last): File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/analyzers/scancode.py", line 62, in analyze msg = subprocess.check_output(cmd_scancli).decode("utf-8") File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output **kwargs).stdout File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 418, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['python3', '/Users/Nishchith/scancode-toolkit/etc/scripts/scancli.py', '/tmp/worktrees/tmp2/.gitignore', '/tmp/worktrees/tmp2/AUTHORS', '/tmp/worktrees/tmp2/LICENSE', '/tmp/worktrees/tmp2/grimoirelab/init.py', '/tmp/worktrees/tmp2/grimoirelab/toolkit/init.py', '/tmp/worktrees/tmp2/grimoirelab/toolkit/_version.py', '/tmp/worktrees/tmp2/setup.cfg']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 472, in run for item in items: File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 589, in fetch raise e File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 583, in fetch for item in items: File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 162, in fetch for item in self.fetch_items(category, kwargs): File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/graal.py", line 183, in fetch_items raise e File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/graal.py", line 176, in fetch_items commit['analysis'] = self._analyze(commit) File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/colic.py", line 161, in _analyze analysis = self.analyzer.analyze(local_paths) File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/colic.py", line 204, in analyze analysis = self.analyzer.analyze(kwargs) File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/analyzers/scancode.py", line 65, in analyze e.output.decode("utf-8"))) graal.graal.GraalError: Scancode failed at /tmp/worktrees/tmp2/.gitignore /tmp/worktrees/tmp2/AUTHORS /tmp/worktrees/tmp2/LICENSE /tmp/worktrees/tmp2/grimoirelab/init.py /tmp/worktrees/tmp2/grimoirelab/toolkit/init.py /tmp/worktrees/tmp2/grimoirelab/toolkit/_version.py /tmp/worktrees/tmp2/setup.cfg,

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/graal", line 6, in exec(compile(open(file).read(), file, 'exec')) File "/Users/Nishchith/GitHub/grimoirelab-graal/bin/graal", line 125, in main() File "/Users/Nishchith/GitHub/grimoirelab-graal/bin/graal", line 71, in main cmd.run() File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 482, in run raise RuntimeError(str(e)) RuntimeError: Scancode failed at /tmp/worktrees/tmp2/.gitignore /tmp/worktrees/tmp2/AUTHORS /tmp/worktrees/tmp2/LICENSE /tmp/worktrees/tmp2/grimoirelab/init.py /tmp/worktrees/tmp2/grimoirelab/toolkit/init.py /tmp/worktrees/tmp2/grimoirelab/toolkit/_version.py /tmp/worktrees/tmp2/setup.cfg,

valeriocos commented 5 years ago

No worries @inishchith :)

I have uploaded a branch with some improvements in the code, however I confirm what you reported: the errors you posted appear when using the develop or master branches of the original repo. However if you perform the following steps and run the same code, no errors pop up:

git clone https://github.com/nexB/scancode-toolkit
git checkout -b xxx 8afa686fb71b9540029234e5a40c0572c4457c28
colic
https://github.com/chaoss/grimoirelab-toolkit
--git-path
/tmp/cdefgh
--exec-path
/home/graal-libs/scancode-toolkit/etc/scripts/scancli.py <-- the repo just downloaded
--category
code_license_scancode_cli
--json

I'll keep investigating and let you know about the advances

inishchith commented 5 years ago

@valeriocos Thanks for checking the issue out. After the checkout commit, I could reproduce the results 👍

Also I checked out your implementation of scancode_cli here. I noticed that you're passing all the files at once as arguments instead of passing files individually as per the in-place convention, does it provide enhanced performance in the former case? I didn't get time to test the ways thoroughly hence thought of asking :)

valeriocos commented 5 years ago

Great @inishchith !

Also I checked out your implementation of ....

Yes, this is one of the feature of scancli (check the comment here: https://github.com/nexB/scancode-toolkit/issues/1400#issuecomment-469055895, and the following one).

If you test scancode and scancli against https://github.com/chaoss/grimoirelab-toolkit you should see the difference.

inishchith commented 5 years ago

@valeriocos thanks for answering. As my unversity exams are under way, i'll work on this when time permits. I'll probably test scancode and scancli to check the difference tomorrow and continue the work which is currently staged.

Sorry for the delayed response.

valeriocos commented 5 years ago

No worries @inishchith , I have just open a PR (https://github.com/chaoss/grimoirelab-graal/pull/28) with some code to use scancli.

Feel free to work on that PR or create a new one.

inishchith commented 5 years ago

@valeriocos Sure. I checked out #28 , The work that i've done until now seems similar. Still, I'll open a PR in some time so that we can work on adding tests for it too.

Thanks

inishchith commented 5 years ago

@valeriocos I think we can close this. what do you think?

valeriocos commented 5 years ago

Sure @inishchith , feel free to close it.