fossology / atarashi

Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.
http://fossology.github.io/atarashi
GNU General Public License v2.0
26 stars 23 forks source link

Parallelize the evaluator algorithm #64

Closed hastagAB closed 3 years ago

hastagAB commented 4 years ago

Description

There is a script to evaluate the algorithms for Atarashi: evaluator.py

Currently, it scans the test files sequentially (One by One). We have to parallelize the script by using multiprocessing, multithreading or something else to reduce the effective time of scanning.

How to solve

Use multiprocessing, multithreading in the main loop of evaluator.py

Kaushl2208 commented 4 years ago

Hey @hastagAB, Both multiprocessing and multithreading are effective in their own way. What do you think this task is about, I mean is it an IO Bound task or CPU bound. I need a bit clarity on that.

Aman-Codes commented 3 years ago

@hastagAB @Kaushl2208 Can I work on this issue if anyone is not working on it ? Also should I use multiprocessing or multithreading ? I found that Ngram is using multithreading to evaluate a single file (#23). So using multiprocessing to process different files might be helpful in this case. What are your views on it ?

Kaushl2208 commented 3 years ago

Hey @Aman-Codes , Thank you for your interest in this. Yes you can move forward with this. The actual scenario case is understood by @hastagAB and he can explain that part. As I can see you can go for multi-processing! Taking @GMishx in the loop as well.

hastagAB commented 3 years ago

Hi @Aman-Codes, evaluator script is a seperate entity and is not directly connected with Atarashi (It is used to evaluate all the agents of Atarashi). Currently it is running sequentially and our main goal is to run different files Parallel to each other at the same time (basically to simulate the working the FOSSology license scanner, so that we can evaluate algorithms in real case scenario.) You are free to experiment with different techniques to find the suitable one. Assigning it to you.

Aman-Codes commented 3 years ago

Thanks for assigning me this 🙂