KnowledgeCaptureAndDiscovery / somef

SOftware Metadata Extraction Framework: A tool for automatically extracting relevant software information from readme files
MIT License
44 stars 22 forks source link

[Header analysis] Fuzzy match #228

Open dgarijo opened 3 years ago

dgarijo commented 3 years ago

There may be typos when writing some headers. We should take into account some fuzyness to improve results.

dgarijo commented 3 years ago

Calculating the edit distance and setting it to a threshold (e.g., 1 or 2) may make it. Here is a reference implementation: https://stackoverflow.com/questions/2460177/edit-distance-in-python

dgarijo commented 3 years ago

Even better: https://github.com/seatgeek/fuzzywuzzy