cldf-clts / clts-legacy

Cross-Linguistic Transcription Systems
Apache License 2.0
4 stars 3 forks source link

Unknown sound handling in __call__ defaults to `0`, not to `UnknownSound` #129

Closed Anaphory closed 4 years ago

Anaphory commented 5 years ago

Why is the default value in https://github.com/cldf/clts/blob/a1343dafd086296fe8b3c999e828062727fcf19e/src/pyclts/util.py#L53 set to '0' instead of any false value, which in https://github.com/cldf/clts/blob/a1343dafd086296fe8b3c999e828062727fcf19e/src/pyclts/util.py#L47 would lead to a generation of UnknownSound, such that the return value of __call__ would be a list of Symbols, instead of a mixed list of Symbols and strs? Could this be changed, or is there too much code depending on this default behaviour now?

Should all false values behave the same in https://github.com/cldf/clts/blob/a1343dafd086296fe8b3c999e828062727fcf19e/src/pyclts/util.py#L47 or should we pick one of them – or even a new singleton value – so that it becomes possible to get None as the default value? I don't have a use case for that alternative, but it would feel cleaner to me.

LinguList commented 5 years ago

This is to be conform with lingpy, where an unknown sound class is rendered as "0" (and we thought of replacing lingpy's internals at some point in the future). But yes, maybe one could ignore it, delete the line, or replace it by a function, in which case one could retain the original value, or I could trigger lingpy behavior with:

lambda x: "0"

I guess, this would be the best solution: replace the default by a function.

Note, that the output of a str object here is also due to compability or behavior expected e.g., in lingpy, but also segments, where strings are the default output. I think this makes sense, as you have a segmented string, and the call is a convenience function we could as well discard completely, and for a list as output, you don't have to write much more code, right?

xrotwang commented 5 years ago

Yes, we should refactor the default handling to be more transparent and consistent. The fact alone that get and __call__ have different defaults for the default keyword is enough to prompt refactoring - even if this breaks backwards compatibility.

Anaphory commented 5 years ago

Without have thought enough about the implications, I like @LinguList's suggestion of taking a function argument, if we are not concerned about BC here.

LinguList commented 5 years ago

As far as I see: refactoring by using a function would be easy, and also in line with what we do in segments. But we may also want to review what we want from a call of a function. The get returns an internal object, I would like to have a function that segments along the lines of "segment" package, that is, convert a segmented string to plain BIPA, but show where it fails. We could have that in another function, and say that call returns a list of Sound/Symbol objects?

xrotwang commented 4 years ago

moved to https://github.com/cldf-clts/pyclts/issues/2