cldf-clts / clts

Cross-Linguistic Transcription Systems
https://clts.clld.org
14 stars 3 forks source link

alphabets to be added #7

Open LinguList opened 7 years ago

LinguList commented 7 years ago
lxkain commented 6 years ago

I'm grateful for this effort!

arpabet + add-ons for TIMIT would be awesome. I have been using this table:

arpabet,ipa,example
AA,ɑ,"balm, bot"
AA1,ɑ,"balm, bot"
AA2,ɑ,"balm, bot"
AE,æ,bat
AE1,æ,bat
AE2,æ,bat
AH,ʌ,butt
AH1,ʌ,butt
AH2,ʌ,butt
AO,ɔ,bought
AO1,ɔ,bought
AO2,ɔ,bought
AW,aʊ,bout
AW1,aʊ,bout
AW2,aʊ,bout
AX,ə,about
AXR,ə˞,letter
AY,aɪ,bite
AY1,aɪ,bite
AY2,aɪ,bite
EH,ɛ,bet
EH1,ɛ,bet
EH2,ɛ,bet
ER,ɜ˞,bird
ER1,ɜ˞,bird
ER2,ɜ˞,bird
EY,eɪ,bait
EY1,eɪ,bait
EY2,eɪ,bait
IH,ɪ,bit
IH1,ɪ,bit
IH2,ɪ,bit
IX,ɨ,"roses, rabbit"
IY,i,beat
IY1,i,beat
IY2,i,beat
OW,oʊ,boat
OW1,oʊ,boat
OW2,oʊ,boat
OY,ɔɪ,boy
OY1,ɔɪ,boy
OY2,ɔɪ,boy
UH,ʊ,book
UH1,ʊ,book
UH2,ʊ,book
UW,u,boot
UW1,u,boot
UW2,u,boot
UX,ʉ,dude
B,b,buy
CH,tʃ,China
D,d,die
DH,ð,thy
DX,ɾ,butter
EL,l̩,bottle
EM,m̩,rhythm
EN,n̩,button
F,f,fight
G,ɡ,guy
H,h,high
HH,h,high
JH,dʒ,jive
K,k,kite
L,l,lie
M,m,my
N,n,nigh
NG,ŋ,sing
NX,ɾ̃,winner
P,p,pie
Q,ʔ,uh-oh
R,ɹ,rye
S,s,sigh
SH,ʃ,shy
T,t,tie
TH,θ,thigh
V,v,vie
W,w,wise
WH,ʍ,why
Y,j,yacht
Z,z,zoo
ZH,ʒ,pleasure
AX-H,ə̥,suspect
BCL,b̚,obtain
DCL,d̚,width
ENG,ŋ̩,Washington
GCL,ɡ̚,dogtooth
HV,ɦ,ahead
KCL,k̚,doctor
PCL,p̚,accept
TCL,t̚,catnip
PAU,.,pause
EPI,.,epenthetic silence
H#,.,begin/end marker
LinguList commented 6 years ago

Thanks for sharing. Could you provide a link or a reference, so we know how to properly describe? We try to reference all alphabets with publications.

tresoldi commented 6 years ago

Wikipedia has some references: https://en.wikipedia.org/wiki/ARPABET

Given that it is used by the CMU pronunciation dictionary and by the TIMIT corpus, it looks a good candidate for inclusion.

LinguList commented 6 years ago

These are good examples for challenging alphabets where generation won't work along the lines of the bipa and gld alphabets and our main code. It is more: long lists of potential symbol combinations, as diacritics are not defined differently from base characters. So we might think of making this another class of alphabet, similar to X-Sampa, where we'll have some file for orthography profile with mappings, and this would be all, linking directly to our main identifiers?