MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
725 stars 68 forks source link

Numeric-String Text Matching #62

Open ganesh-morsu opened 11 months ago

ganesh-morsu commented 11 months ago

Hi

I am using the Text matching using Poly fuzz

import polyfuzz
model = polyfuzz.PolyFuzz()

model_fit=model.fit(["CIPLAR LA 40 TABLET", "CIPLAR LA 80 TABLET"])

model_fit.transform(['CIPLAR LA 40 TABLET'])

output :- {'TF-IDF':                   From                   To  Similarity
 0  CIPLAR LA 40 TABLET  CIPLAR LA 80 TABLET         1.0}

The matching is coming CIPLAR LA 80 TABLET but it should be CIPLAR LA 40 TABLET

It is not considering numeric, Do we have any option to not ignore numeric

MaartenGr commented 11 months ago

I would advise checking out the list of models you can choose from. More specifically, you can choose to use TF-IDF together with numeric values for updating its parameters for tokenizations and preprocessing.