Open parantak opened 4 years ago
It'd be great if we have a source for this. It won't make sense putting efforts into something which won't be used eventually.
@parantak As much as I remember especially Smith-Waterman Algorithm isn't relevant for string, in case of DNA it matches a particular sequence, so if the minority sentence was a discount union of subsets of another string it would give 100% similairty- "My name" and "My name is Somesh Singh" will be 100% same (Smith-Waterman looks after exclusion) while Needle-Wunshch looks for inclusion. So personally I don't think it would be that useful. Check what we used for BIOF110 here and you will notice the difference
@someshsingh22 Right, sorry. I vaguely remembered them both. I searched a bit, and I believe Smith-Waterman is a local alignment algorithm whereas Needle-Wunsch is a global alignment algorithm. So, I guess Smith-Waterman might not be as relevant. However, I am sure Needlman-Wunsch should be a good metric because unlike static penalties in Levenshtein, the algorithm implements different penalties for matches, mismatches, and gaps. As for Smith-Waterman, I'll look into it soon to gain a better understanding and to make sure we aren't missing out on anything.
Yes, that was my point, I don't remember Needleman-Wunsch well either. Do look for some supporting literature in similar domains of NLP before you move on though.
@someshsingh22 Yeah, of course.