cldf-clts / soundvectors

MIT License
1 stars 0 forks source link

Handling of invalid sounds #18

Closed arubehn closed 4 months ago

arubehn commented 4 months ago

@LinguList I was thinking what would be the best way to handle sounds that we can not parse. Currently, we throw a ValueError which breaks the program once a single invalid sound is encountered (including symbols like + or - that frequently occur in cross-linguistic data).

Would it maybe be better to just raise a warning in such cases and return a vector full of 0? What do you think?

LinguList commented 4 months ago

I would prefer to break, to make sure people know their data does not pass. We have already now a behaviour that is quite permissive, as you can see when you write:

c2v = SoundVectors()
c2v.get_vec("my favorite cluster is this one bilabial voiced stop consonant")

So extending this even more seems difficult. We can discuss returning a single 0 or an only 0 vector when CALLING the method:

>>> c2v(["a", "b", "c"])
[None, None, None]

This means get_vec still throws an error by the call method catches it.

LinguList commented 4 months ago

Did you implement this, @arubehn?

arubehn commented 4 months ago

Not yet, that‘s why this issue is still open :) will take care of the remaining pieces on Monday.