fossology / atarashi

Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.
http://fossology.github.io/atarashi
GNU General Public License v2.0
26 stars 23 forks source link

FEAT: Increasing the overall performance of Atarashi #86

Open SinghShreya05 opened 3 years ago

SinghShreya05 commented 3 years ago

We can improve the performance of atarahi, nirjas, and others by using Numba and RAPIDS by Nvidia. Regular NumPy, pandas, and other libraries are slow. Maximum amount of time is wasted in serialization, deserialization, pre-processing, transfer of memory between CPU and others. We can make it fast using Numba's parallel processing, JIT, and in built features which can even work on CPU. Also, most of the programs can be made even faster using RAPIDS' cuML, cuDF, dask, etc by executing everything through a GPU like pre-processing, vectorization, database query, serialization, deserialization, parallel processing, etc. The entire codebase can be translated without much hassle resulting in computational efficiency, higher accuracy, and lower memory usage. This can ensure Atarashi's integration with FOSSology. I have somewhat started with the work. Can I proceed with the same?? @hastagAB @GMishx

hastagAB commented 3 years ago

Hi @SinghShreya05, The suggestions look quite interesting and convincing to me. Please feel free to continue your work on this. That would be a great improvement. Do let us know in case you need any help.