Open ttruong-gilead opened 5 months ago
same here
document.search_words is actually broken. The issue comes from the line below: -> _search_words_with_similarity in page.py similarity = ( similarity if similarity_metric == SimilarityMetric.COSINE else -(similarity) )
LEVENSHTEIN is a value between [0,1] COSINE is a value between [-1, 1] but -1.0 represents an inverse correlation and is therefore not compatible with the expected result. The lower bound must be set to 0.
Currently the code return a list of -0.0 for LEVENSHTEIN and to fix this bug,you must write the new line of code with:
if similarity_metric == SimilarityMetric.COSINE: similarity = 0.0 if similarity < 0.0 else similarity else similarity = similarity
amazon-textract-textractor==1.7.9
document.search_words(keyword="Tom Brady")
orpage.search_words(keyword="Frank")
doesn't work as expected. Returns a list of random letters or words not even close to keywords. Tried playing with the similarity_threshold to no avail.