cldf-clts / pyclts

Apache License 2.0
11 stars 2 forks source link

Improvements to `TranscriptioData` #34

Closed tresoldi closed 4 years ago

tresoldi commented 4 years ago

The usage of TranscriptionData to map a grapheme in a dataset to a CLTS sound (i.e., .resolve_grapheme()) does not seem to be documented, and raises an exception (ValueError) if we pass a a grapheme not in the list.

As it is arguably one of the main intended usages of the class, I would suggest we:

I can prepare a PR for that, it accepted.

tresoldi commented 4 years ago

Pinging @LinguList and @xrotwang

LinguList commented 4 years ago

So to clarify, you have a sound in Phoible's original grapheme list, and want the corresponding BIPA sound, right?

tresoldi commented 4 years ago

Yes, it is for the inventory study. I can get the CLTS sound with phoible.resolve_grapheme(grapheme) (with phoible the clts.transcriptiondata("phoible"), but I did not find documentation for it.

Using phoible[grapheme] would make it more similar to transcription systems, and there is also the issue that it currently raises an exception if the grapheme is not listed (in this case, due to using the development version of Phoible, while the mapping in CLTS data follows the last released version).

LinguList commented 4 years ago

If you check lexibank/allenbai as well as the code that I wrote for the inventory comparison in lexicore (will be moved to lexibanklater, now lexibank/pylexicore) you will see that we have another way to proceed here:

https://github.com/lexibank/allenbai/blob/c2f52114da97334dc63f8d2fafe5b308f2031924/lexibank_allenbai.py#L116-L118

Every td has its grapheme_map attribute that allows to access the BIPA sound for a corresponding original grapheme.

The essential code is here:

https://github.com/cldf-clts/pyclts/blob/956b07297b1402c39403b7c40e8f32d4eb24bae9/src/pyclts/util.py#L72-L85

tresoldi commented 4 years ago

Ok, I will do it this way.