Closed tresoldi closed 6 years ago
This looks promising already. Very nice and many thanks!
What I'd do, however, is use the data that you have there to create the consonants.tsv
and the vowels.tsv
for the transcription systems folder. Diacritics can be empty, normalize.tsv can be slightly modified or just taken from bipa.
With the current clts, you can actually print the symbols in the TSV-form needed by simply invoking bipa[sound].table
. So assuming your file is called "cons.tsv", I just modified as follows:
In [1]: from lingpy import csv2list
In [2]: from pyclts import *
In [3]: bipa = TranscriptionSystem('bipa')
In [8]: out = []
In [9]: for line in csv2list('cons.tsv'):
...: try:
...: tbl = bipa[line[1]].table
...: tbl[0] = line[2]
...: out += [tbl]
...: except:
...: out += [['!']+line]
In [12]: with open('consonants.tsv.txt', 'w') as f:
...: f.write('GRAPHEME\tPHONATION\tPLACE\tMANNER\tALIAS\tEXTRA\tNOTE\n')
...: for line in out[1:]:
...: f.write('\t'.join(line)+'\n')
The resulting file is here: consonants.tsv.txt
In fact, I just saw we should separate consonants and vowels ;)
Length marker would be candidates to be put into the diacritics in addition, similar to palatalization marker.
There are further some sounds not recognized, but here's the solution proposed:
name | upa | note |
---|---|---|
devoiced voiced labio-dental stop consonant | ʙ͔ | here, we lack the labio-dental stop in our data, we should consider adding ist |
palatalized voiced alveolar consonant | ď | rewriting as "palatalized voiced alveolar stop consonant" will do |
palatalized voiceless alveolar lateral approximant consonant | ʟ́ | we seem to lack the voiceless lateral approximant, or, if the accent denotes devoicing, we should rather call it "palatalized devoiced voiceless alveolar lateral approximant consonant" |
palatalized voiceless alveolar nasal consonant | ɴ́ | same as one up |
palatalized voiceless alveolar trill consonant | ʀ́ | dito |
voiced labio-dental stop consonant | b͔ | again our problem with the labio-dental,which is missing in bipa |
voiceless labio-dental stop consonant | p͔ | dito |
voiceless uvular nasal consonant | <?><!> | ᴎ͔ |
Sorry, not sure if I understood:
transcriptionsystem/upa
resource?yes, a transcriptionsystem/upa would be excellent, and yes, please add them, if they are missing ;)
Excellent work! May I ask you in addition to add the following information:
Furthermore, you may want to consider adding some diacritics (for devoicing and revoicing, for example, so that the system becomes more productive, let me know if the format is not clear).
Finally able to get back. Diacritics, markers, normalization and tones are missing, as they are either non-existent or I'm not sure. The only ones really important are diacritics, I'll work on them later.
In [1]: from pyclts import TranscriptionSystem
In [2]: upa = TranscriptionSystem('upa')
In [3]: upa['ʙ']
Out[3]: <pyclts.models.Consonant: devoiced voiced bilabial stop consonant>
Done with the references, will work on the diacritics using the ISO standard.
I've just commited a few diacritics, but I am in doubt about the others.
Super. Feel free to merge and many thanks!
Thank you. I'll work on the turned and sideways vowels when I get back home, than merge, shouldn't tak emuch.
I'll move to X-SAMPA later. Your deadline for the article was next week, right?
2018-01-26 19:26 GMT-02:00 Johann-Mattis List notifications@github.com:
Super. Feel free to merge and many thanks!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cldf/clts/pull/82#issuecomment-360908735, or mute the thread https://github.com/notifications/unsubscribe-auth/AAar9-08xnZRZNHPp3MHleAXxkDQSYnDks5tOkMigaJpZM4RT6Mv .
Thank you. I'll work on the turned and sideways vowels when I get back home, than merge, shouldn't tak emuch. I'll move to X-SAMPA later. Your deadline for the article was next week, right?
Yes, we'll submit on Wednesday, but we can also do with UPA without X-Sampa for the time being, I'd say. We'll sub mit the code anonymously via osf-framework, and show a few screenshots of the CLLD app. We have 12 transcription datasets right now, several sound classes, and 5 transcription systems with UPA, I think this is impressive enough, even if there are still a few bugs to be resolved.
Great. I'm merging UPA, then, after adding near-close near-front vowels. Two main issues:
Some clean-up is probably due in the vowel listing, now that diacritics have been implemented. Of course, one need to take care in terms of pre-composed glyphs.
If vowels with diacritics are listed redundantly, this is even better. The diacritics are only a shortcut guaranteeing a better "generation".
A preliminary and rudimentary version, as proposed at https://github.com/cldf/clts/issues/79
I'm only adding a TSV file, without the necessary changes to dump.py, statistics.py, pyclts/transcriptiondata.py etc. These can be added later, if this mapping is accepted.