PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

[BUG] Default trascription variants and interlinear gloss #787

Closed stannam closed 2 years ago

stannam commented 2 years ago

Typically, a corpus with variant forms involves three attributes: orthography, default transcription, and alternative transcription. Alternative transcription varies within lexical items.

We also allow variant forms without the alternative transcription attribute. I.e., the user can select canonical transcriptions to vary within lexical items (see #703). The problem in this case is that analysis functions cannot recognize transcription.

Two options

1. No canonical but only variants

2. Do not allow default transcription to vary

stannam commented 2 years ago

update:

1. Pronunciation variants is only allowed for 'transcription (alternative)' (solved)

e.g., in the following mini example, two 'tomato's should be different entries <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

tomato | tomato | tomato -- | -- | -- təmeɪɾoʊ | təmeɪɾoʊ | təmatoʊ

Whereas in the example below, one 'tomato' has pronunciation variants (with 'canonical' being [təmeɪɾoʊ]). <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

tomato | tomato | tomato -- | -- | -- təmeɪɾoʊ | təmeɪɾoʊ | təmeɪɾoʊ təmeɪɾoʊ | təmeɪɾoʊ | təmatoʊ

Also, see 'example_files/variants/interlinear_variants' (in the shared dropbox storage).

2. ILG has no inventory categorization (solved)

  all recognized partially recognized none recognized
ilg
csv

kchall commented 2 years ago

Can confirm that all three cases in the Excel file are working for me.

kchall commented 2 years ago

I think this is all working, and the documentation is up to date.