OpenSextant / Xponents

Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.
Apache License 2.0
44 stars 7 forks source link

PhoneticFilter experimentation #27

Open mubaldino opened 5 years ago

mubaldino commented 5 years ago

Baz zAz = two tokens, likely bz, zaz. But if we find Bazzaz ==> bzaz the resulting phonetics are the same, but difficult to match.

Deir ezzor vs Der ez Zor.... again similar phonetics in a bigram or trigram, but hard to compare if phonetics are not computed as such.

mubaldino commented 5 years ago

https://lucene.apache.org/solr/guide/7_4/phonetic-matching.html