TechnionYP5777 / Bugquery

Bug query
9 stars 1 forks source link

Add distancer based on Levenshtein distance #100

Closed rodedzats closed 7 years ago

rodedzats commented 7 years ago

As we discussed in our group meeting and based on matching todo, we should implement a new StackTraceDistance based on Levenshtein distance. After that we should check what is the better distancer using #91

yonzarecki commented 7 years ago

You should take a look at #63, I remember finding some interesting stuff there. Also I took a look at my browsing history and found some (hopefully) helpful links specifically about the Levenshtein from the previous semester.

  1. A github repo with string similarity measures in Java - link
  2. An SO post about why Levenshtein isn't really good - link
  3. Reference implementation of the Levenshtein distance - link

Hope this helps :smile:

rodedzats commented 7 years ago

Finished the simple Levenshtein stack trace distancer. I checked it on a few simple cases (simple stack traces) and it works good (see releveant tests). We should check it further after completing #91 and comparing it to the other distancers. We may need to add another cost class inorder to favour the beginning/end of the stack trace.

yonzarecki commented 7 years ago

As suggested in our meeting with Adi, we can add an option of bulk deletion of a block (reducing the effective cost) which will make more similar queries closer by when using this distancer. We will need to discuss this after completing #91, and see if it makes any difference in real use-cases.