cldf / cltoolkit

Toolkit for Processing Cross-Linguistic Data in CLDF
MIT License
3 stars 0 forks source link

Grapheme_inventory vs. Sound_inventory #8

Open LinguList opened 3 years ago

LinguList commented 3 years ago

Graphemes require valid Segments, but Segments are not always valid BIPA, so we should distinguish a grapheme_inventory (which shows only occurrences) and a Sound inventory (which shows the occurrences for bipa-normalized segments).

Methods differ, as Grapheme-inventories can only be compared by Jaccard.

LinguList commented 3 years ago

The name for the sound inventory is now Language.sound_inventory, so adding a Language.grapheme_inventory is trivial, and would take the segments and not the tokens of a language.

LinguList commented 3 years ago

The wordlist class computes occurrences of sounds for both segmented BIPA sounds and for segmented non-bipa sounds, so this is really parallel, the only difference is that a list of segmented graphemes does not allow access to feature data in CLTS.