TeamCohen / secondstring

A bunch of fancy soft string matching routines, with some accompanying datasets
Other
54 stars 23 forks source link

StackOverflowError while using Levenstein scroe() #8

Open skuladeep24 opened 1 year ago

skuladeep24 commented 1 year ago

We are facing an issue with the class below in a hadoop job. we are wondering if there are any limitations in using this class for calculating the similarity. Please advise.

Class: com.wcohen.ss.Levenstein

Method invocation: Levenstein().score(str1, str2)

Error: ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.StackOverflowError at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41) at com.wcohen.ss.MemoMatrix.get(MemoMatrix.java:40) at com.wcohen.ss.NeedlemanWunsch$MyMatrix.compute(NeedlemanWunsch.java:41)

tfmorris commented 1 month ago

You might want to look at the Apache implementation since its documentation specifically references usage with long strings https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/similarity/LevenshteinDistance.html