cldf-clts / clts-legacy

Cross-Linguistic Transcription Systems
Apache License 2.0
4 stars 3 forks source link

Make sounds hashable #107

Closed Anaphory closed 6 years ago

Anaphory commented 6 years ago

It would be very nice if Sound objects were hashable. I'm not entirely sure what should be hashed, but creating sets of Sounds (eg. for phoneme inventories) and having dicts with sound sequence (tuple of Sound) keys looks useful to me already just from building some toy functionality on the basis of pyclts.

There is probably a better implementation than

    def __hash__(self):
        return hash(self.name)

but I haven't delved into the intestines of the objects or what TranscriptionSystems and other classes might need here so say what that might be.

tresoldi commented 6 years ago

I had an idea some time ago of making a bit map of all features (each possible bipa feature a bit, set to true or false accordingly). It would be a kind of locality sensitive hashing, allowing to compare sounds up to a point.

Just an idea, but wouldn't take long to implement, and you could guarantee a perfect hashing (one hash mapping to only one sound).

Em 21 de fev de 2018 12:34 PM, "Gereon Kaiping" notifications@github.com escreveu:

It would be very nice if Sound objects were hashable. I'm not entirely sure what should be hashed, but creating sets of Sounds (eg. for phoneme inventories) and having dicts with sound sequence (tuple of Sound) keys looks useful to me already just from building some toy functionality on the basis of pyclts.

There is probably a better implementation than

def __hash__(self):
    return hash(self.name)

but I haven't delved into the intestines of the objects or what TranscriptionSystems and other classes might need here so say what that might be.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cldf/clts/issues/107, or mute the thread https://github.com/notifications/unsubscribe-auth/AAar92dpHhCMFL0CMFHWxAeXE4MXOBQxks5tXDeIgaJpZM4SN1sB .

xrotwang commented 6 years ago

Sounds somewhat like using the feature vector as hash.

Tiago Tresoldi notifications@github.com schrieb am Mi., 21. Feb. 2018, 16:40:

I had an idea some time ago of making a bit map of all features (each possible bipa feature a bit, set to true or false accordingly). It would be a kind of locality sensitive hashing, allowing to compare sounds up to a point.

Just an idea, but wouldn't take long to implement, and you could guarantee a perfect hashing (one hash mapping to only one sound).

Em 21 de fev de 2018 12:34 PM, "Gereon Kaiping" notifications@github.com escreveu:

It would be very nice if Sound objects were hashable. I'm not entirely sure what should be hashed, but creating sets of Sounds (eg. for phoneme inventories) and having dicts with sound sequence (tuple of Sound) keys looks useful to me already just from building some toy functionality on the basis of pyclts.

There is probably a better implementation than

def hash(self): return hash(self.name)

but I haven't delved into the intestines of the objects or what TranscriptionSystems and other classes might need here so say what that might be.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cldf/clts/issues/107, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAar92dpHhCMFL0CMFHWxAeXE4MXOBQxks5tXDeIgaJpZM4SN1sB

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cldf/clts/issues/107#issuecomment-367367418, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1HKOnGO1-l_aebpOsOoLNkjH5kMJ8Aks5tXDjYgaJpZM4SN1sB .

LinguList commented 6 years ago

the feature vector is a frozenset, this is hashable already, ain't it?

LinguList commented 6 years ago

if you frozenset your features, you can access them via bipa.features. Or do you mean something differently?

tresoldi commented 6 years ago

Yes, but I was thinking of representing it as a normal array of bytes, like a "normal" hash (i.e., hexadecimal representation and so on) -- of course, deep down it is just a number. But again, just an idea I had some time ago

Em 21 de fev de 2018 12:52 PM, "Johann-Mattis List" < notifications@github.com> escreveu:

if you frozenset your features, you can access them via bipa.features. Or do you mean something differently?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cldf/clts/issues/107#issuecomment-367371563, or mute the thread https://github.com/notifications/unsubscribe-auth/AAar92pE9xHbcoiKzGUl8HS6pdCXe3iWks5tXDujgaJpZM4SN1sB .

LinguList commented 6 years ago

is this still considered to be important for anybody? If not I'll just close for the time being...