direct-phonology / dphon

uncover old chinese textual parallels based on sound
MIT License
12 stars 1 forks source link

replace indexing step with nearest-neighbor search #131

Open thatbudakguy opened 3 years ago

thatbudakguy commented 3 years ago

this is useful for two reasons:

we could start by looking into any libraries that do locality-sensitive hashing, like datasketch or the popular annoy. there's a great explanation of LSH here and a detailed one related to document comparison in 3.4.1 of Mining Massive Datasets.

thatbudakguy commented 4 months ago

if we use a vector database as suggested by #152, we could get this type of search built-in.