fossology / atarashi

Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.
http://fossology.github.io/atarashi
GNU General Public License v2.0
26 stars 23 forks source link

Removing third party module in dameruLevenDist agent #90

Closed its-sushant closed 2 years ago

its-sushant commented 2 years ago

Right now atarashi is using damerau_levenshtein_distance imported from pyxdameraulevenshtein in dameruLevenDist agent. The function is not that long and we do not if it will get removed. So, We can remove it from atarashi and write our own damerau_levenshtein_distance function to increase the overall speed of dameruLevenDist agent and make atarashi less dependent on other repository. I have already started working on it. Can i proceed further?

Kaushl2208 commented 2 years ago

Interesting, Are you referring to replace the library import to indigenous function? If that so, I have few questions:

  1. Why it is required to have our own function like as a broader picture Can you explain?
  2. How fast are we making the DameruLevenDist? Do you have any benchmark as of now?
  3. Are we also focusing on increasing the accuracy of the agent ?

CC: @GMishx @hastagAB

its-sushant commented 2 years ago

@Kaushl2208 Currently Atarashi depends upon pyxDamerauLevenshtein library which has no further development and lacks proper maintenance of the codebase. Implementing the same edit distance natively will give us more flexibility in terms of customisation and maintenance and will make Atarashi less dependent. Also, I'll be applying the method similar to the currently implemented one so the accuracy should remain more or the same but we can surely look for the ways to improve it.