fossology / atarashi

Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.
http://fossology.github.io/atarashi
GNU General Public License v2.0
26 stars 23 forks source link

Feat(models): Implemented three models for license similarity #69

Open Kaushl2208 opened 4 years ago

Kaushl2208 commented 4 years ago

Description

Implementation of Logistic Regression, Multinomial Naive Bayes and Linear SVC on license dataset licenseList.csv. The main purpose of implementing this idea was to plan for a model which can make atarashi faster and more accurate.

Files

How to use?

ToDo

Accuracy Score

Model Name Accuracy Score in % Time taken on 100 files in (sec)
Logistic Regression 31 88.6
Linear SVC 36 79.4
Multinomial Naive Bayes 30 83.72

Future Scope

CC: @hastagAB @GMishx @ag4ums

Signed off by: Kaushlendra Pratap Singh kaushlendrapratap.9837@gmail.com

Kaushl2208 commented 4 years ago

@hastagAB @GMishx , I implemented the models command into atarashii.py but it seems like I am missing something to update somewhere in code.

Kaushl2208 commented 4 years ago

@GMishx @ag4ums I have run all three models on the Test files and I am attaching the screenshot of the results.

SVC

SVC

NB

NB

Logistic Regression

LR