digling / burmish

LingPy plugin for handling a specific dataset
GNU General Public License v2.0
1 stars 1 forks source link

Phoneme inventories need external checking #72

Closed LinguList closed 7 years ago

LinguList commented 7 years ago

I just realized that the phoneme inventories are strange in some parts: Lashi has apparently "w" and "v", as you can see here. But this seems to be complementary, at least I don't really see one pair where it would be distinguished. I also regard the distinction between "ə" of length ultrashort, vs. normal, vs. long as strange.

These things betray the algorithms and cause problems. So I suggest, that we need to carry out a thorough analysis of all phoneme inventories, as they are currently reflected. As you, @nh36 have read external sources about core languages like Lashi, I'd suggest that you independently elicit the sound system in a spreadsheet for those languages, where you have access to it (say, using Nishi), and we then compare the things which don't make sense. In the other cases, like Xiandao, where, as I understand, research is not that extensive, we can think about automatic methods to handle this.

nh36 commented 7 years ago

I have checked Nishi and he also has v- and w- for Lashi, presumably because he is relying on similar Chinese data. I do not see a conditioning environment. (Note that there are other problems in Lashi like long versus short vowels, this fieldworker may have just had an overly sharp ear). I am afraid we have to leave it for the time being. (Although oddly he does not list w- in the phoneme inventory for Lashi). Acc. to Wannemacher 2011 Lashi has only w- acc. to Yabu 1988 there is only /v/ but it can be realized as both [w] and as beta. So, it does seem like we can ignore this issue.

nh36 commented 7 years ago

In terms of methodology, my strong preference is that we take the data that we are given as it is. We do something mechanical like reconstruct w and v as separate segments in the Ursprache, we then note that they are in complementary distribution there, and then we claim that they are in complementary distribution in Xiandao and say we disagree with the original authors.

We have already determined that the data is too sparse for a phonemic analysis, so we must leave the original data s it is.

My approach to something like the ultra short vowels would be similar. For languages that we have multiple sources we should add further doculects and then work out such problems at the doculect reconciliation stage.

Dr Nathan W. Hill Reader in Tibetan and Historical Linguistics Department of China & Inner Asia and Department of Linguistics SOAS, University of London Thornhaugh Street, Russell Square, London WC1H 0XG, UK Tel: +44 (0)20 7898 4512

Profile -- http://www.soas.ac.uk/staff/staff46254.php

Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/

On Fri, Oct 21, 2016 at 11:04 AM, Johann-Mattis List < notifications@github.com> wrote:

I just realized that the phoneme inventories are strange in some parts: Lashi has apparently "w" and "v", as you can see here http://tsv.lingpy.org/plugouts/ipa_chart.html?doculect=Lashi&sound_list=%2B%2C21%2C31%2C33%2C35%2C53%2C55%2C%CA%94%2Cj%2C%C9%A3%2Ck%2Ck%CA%B0%2Cx%2Cm%2Cn%2C%C5%8B%2Cf%2Cp%2Cp%CA%B0%2Cl%2Cs%2C%CA%83%2Ct%2Ct%CA%B0%2Cts%2Cts%CA%B0%2Ct%CA%83%2Ct%CA%83%CA%B0%2Ca%2C%C4%83%2Ca%CC%B0%2Ca%CB%90%2Ca%CC%B0%CB%90%2C%C9%91%2C%C9%91%CC%B0%2Ce%2Ce%CC%B0%2Ce%CB%90%2Ce%CC%B0%CB%90%2C%C9%99%2C%C9%99%CC%86%2C%C9%99%CC%B0%2C%C9%99%CB%90%2C%C9%99%CC%B0%CB%90%2C%C9%9B%2C%C9%9B%CC%B0%2C%C9%9B%CB%90%2C%C9%9B%CC%B0%CB%90%2Ci%2C%C4%AD%2Ci%CC%B0%2Ci%CB%90%2Ci%CC%B0%CB%90%2Co%2C%C5%8F%2Co%CC%B0%2C%C9%94%2C%C9%94%CC%B0%2C%C9%94%CC%B1%CC%86%2C%C9%94%CB%90%2C%C9%94%CC%B0%CB%90%2C%C9%9431%2C%C9%BF%2Cu%2Cu%CC%B0%2Cu%CB%90%2Cu%CC%B0%CB%90%2Cy%2Cy%CB%90%2Cv%2Cw%2C%E2%80%A0_%2C%E2%80%A0%C3%BF. But this seems to be complementary, at least I don't really see one pair where it would be distinguished. I also regard the distinction between "ə" of length ultrashort, vs. normal, vs. long as strange.

These things betray the algorithms and cause problems. So I suggest, that we need to carry out a thorough analysis of all phoneme inventories, as they are currently reflected. As you, @nh36 https://github.com/nh36 have read external sources about core languages like Lashi, I'd suggest that you independently elicit the sound system in a spreadsheet for those languages, where you have access to it (say, using Nishi), and we then compare the things which don't make sense. In the other cases, like Xiandao, where, as I understand, research is not that extensive, we can think about automatic methods to handle this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/digling/burmish/issues/72, or mute the thread https://github.com/notifications/unsubscribe-auth/AIdHxbxKUpVc3xAABRefWy-unqUrSOMTks5q2JxVgaJpZM4KdHb7 .

LinguList commented 7 years ago

well, I think we can handle that, but in the end these things just mean more work, as they disturb the patterns in the data. Yet, on the other hand, this is also how it is, and explanations for irregularities will necessarily contain things like 1) errors, 2) inexplicable things, 3) borrowing, 4) alien influence. So as long as this is all properly mentioned, it is okay, but one could keep it in mind for the phonetic description, where it could be mentoned that it seems a bit weird...

LinguList commented 7 years ago

I guess it's fine to proceed like this, however, obvious errors in OCR need to be checked, also cases like v/w need to be noted, or the strong palatal nasals in Xiandao. They confuse the QPA and while we can handle with this to some degree, we need to keep in mind that this is one major source of errors in the analysis.

I'm thinking a bit of the following alternative:

If we find some more or less authoritative phonological sketch of a given language, with those tables, and we have the language in the sample, let's store them separately in a spreadsheet. Burmese, for example, should be easy, etc. And then, once this is done, we can compare what we find in the data.

I'm currently also testing this on Chinese dialects, so there are some synergies here, and I'll have a more concrete plan of action for this hopefully already soon.

nh36 commented 7 years ago

I have already added a table to the document discussing this. I am prepared to go through and change the 'w' to 'v', but hesitate, because it means that you will have to redo the pattern analysis and then the pattern numbers (which I already refer to in the draft) will no longer be accurate.

Dr Nathan W. Hill Reader in Tibetan and Historical Linguistics Department of China & Inner Asia and Department of Linguistics SOAS, University of London Thornhaugh Street, Russell Square, London WC1H 0XG, UK Tel: +44 (0)20 7898 4512

Profile -- http://www.soas.ac.uk/staff/staff46254.php

Tibetan Studies at SOAS -- http://www.soas.ac.uk/cia/tibetanstudies/

On Tue, Oct 25, 2016 at 6:54 PM, Johann-Mattis List < notifications@github.com> wrote:

well, I think we can handle that, but in the end these things just mean more work, as they disturb the patterns in the data. Yet, on the other hand, this is also how it is, and explanations for irregularities will necessarily contain things like 1) errors, 2) inexplicable things, 3) borrowing, 4) alien influence. So as long as this is all properly mentioned, it is okay, but one could keep it in mind for the phonetic description, where it could be mentoned that it seems a bit weird...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/digling/burmish/issues/72#issuecomment-256139529, or mute the thread https://github.com/notifications/unsubscribe-auth/AIdHxQrISXa4_NGp_F1JUUKOPUz0jxmJks5q3lBpgaJpZM4KdHb7 .

LinguList commented 7 years ago

Oh, here, I'm sorry, but I'll have to disappoint you anyway, as I'll need to rerun the library testing for the next step, and when I make progress with the code, the patterns will change again.

So it is probably better to store these cases of problems or curiosity by referring to cognate sets with the concepts, as they are less likely to be changed completely.

I'll try to add an update to the method this week, where I'll see whether I can explicitly sub-cluster groups like "w/v" in which only one language is weird. This, which I'd then call the second phase of the pattern analysis, then hopefully gives us more material to explore, including Xiandao problems.

No worries regarding the draft: the pattern probably will remain stable, unless other things are changed, and we may, if it works out the secondary comparison is fruitful, that we can take over from there, e.g., by just reconstructing the same sound, making an annotation «w is irregular».

LinguList commented 7 years ago

we'll close this, but it survives in other threads, as we'lll need the inventories to create the orthography profiles (for double-checking, also to check the sources). we can convert this issue into a milestone, but also just leave it, as phoneme inventories will now be part of our workflows.