So far, we distinguish morphemes and phonemes in the segmentation. However, the data occasionally also shows word-boundaries, as in this entry:
693 HPUN "six": kɔ́ŋyɔ́ŋ màŋtàŋ | 431-1
I suggest using the underscore, which is already recognized as a separation marker for word boundaries in these cases and write it internally as:
k ɔ́ ◦ ŋ y ɔ́ ŋ _ m à ŋ ◦ t à ŋ
instead of
k ɔ́ ◦ ŋ y ɔ́ ŋ ◦ m à ŋ ◦ t à ŋ
Since there are only spurious cases where this will be needed, writing an algorithm for this level of segmentation is not feasible. However, we should keep it in mind when dealing with the data later on.
So far, we distinguish morphemes and phonemes in the segmentation. However, the data occasionally also shows word-boundaries, as in this entry:
I suggest using the underscore, which is already recognized as a separation marker for word boundaries in these cases and write it internally as:
instead of
Since there are only spurious cases where this will be needed, writing an algorithm for this level of segmentation is not feasible. However, we should keep it in mind when dealing with the data later on.