fossology / atarashi

Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.
http://fossology.github.io/atarashi
GNU General Public License v2.0
26 stars 23 forks source link

feat(test): Improve scan time using import #84

Closed GMishx closed 3 years ago

GMishx commented 3 years ago

Import atarashii scanner to reuse scanner object for each run rather than creating a new sub-process each time. Improving the scan time.

Some statistics:

Agent Old time New time Imp
wordFrequencySimilarity 75.46 17.14 77.28%
DLD 1601.52 1213.56 24.22%
tfidf 85.01 24.41 71.28%
Ngram 301.4 217.87 27.71%

The changes were implemented for atarashii.py::main() as well.

Some more changes were done in license_merger to completely eliminate old SPDX license names, e.g., GPL-2.0 and LGPL-2.1+. This also results in some renaming of test files.

GPL-2.0.c -> GPL-2.0-only.c GPL-2.0.h -> GPL-2.0-only.h LGPL-2.0+.c -> LGPL-2.0-or-later.c LGPL-2.1+.c -> LGPL-2.1-or-later.c LGPL-3.0+.c -> LGPL-3.0-or-later.c LGPL-3.0.hxx -> LGPL-3.0-only.hxx

Signed-off-by: Gaurav Mishra mishra.gaurav@siemens.com

GMishx commented 3 years ago

@ag4ums @amanjain97 @hastagAB , can we please review this PR before GSoC starts?

GMishx commented 3 years ago

Ping @ag4ums