dijkstracula / pacldb

Pan-Dene Comparative Lexicon
0 stars 0 forks source link

normalized IPA representation for sorting #11

Open dijkstracula opened 5 years ago

dijkstracula commented 5 years ago

Currently we sort the IPA column such that special characters get ordered out of expected order. It doesn't look like there's an obvious way to get postgres to sort by a custom order that isn't defined by a locale.

Given an ordering of IPA symbols, we could sort on the serverside (with something like lambda a,b: return ipa_inorder.indexof[a] < ipa_inorder.indexof[b], but that will break pagination as we'll have to read the entire result in on each page rendering.

I think an easier way is going to be to add a new field in Term that has a normalised representation that the database knows how to sort; so we'll replace each character with a relative ordering, like:

a -> a
æ -> ae
e -> e
ə -> eh
...
s -> s
ʃ -> sh
...

In so doing, the normal lexographic ordering will do the right thing (e.g. "æ" will be sorted after "a", "ʃ" after "s", and so on.)

dijkstracula commented 5 years ago

Actually, come to think of it, since there are surely fewer than 52 IPA characters, can we just map the ith entry in Sally's IPA ordering to the ith entry in [A-Za-z], stick that in a custom column that's only used for sorting, and call it a day