CohenPr-XPF / XPF

30 stars 7 forks source link

XPF

Check out our interactive website: The XPF Corpus

The preliminary manual of the corpus can be found here.

Repository

Available Languages

Language Code Language (click for info) Comments
aak Ankave
aau Abau
ab Abkhaz
acf Saint Lucian Creole French lacks lenition
aey Amele
agg Angor
aia Arosi lacks lenition
amn Amanab
an Aragonese
aom Aomie lacks lenition
apu Apurinã
apy Apalaí lacks lenition
arl Arabela
ast Asturian
ata Pele-Ata
avt Au
ay Aymara
az Azerbaijani
ba Bashkir
bdd Bunama
be Belarusan lacks lenition
bef Benabena
bg Bulgarian
bi Bislama
boa Bora
boj Anjam lacks lenition
bug Bugis
bvr Burarra
bxr Russia Buryat
caa Ch'orti'
car Carib
cbi Cha'palaa
cbk Chavacano
cbt Chayahuita
cbu Candoshi Shapra lacks lenition
cnm Ixtatán Chuj
crh Crimean Tatar
cs Czech
ctu Chol
cv Chuvash
ded Dedua
dgz Daga
djr Djambarrpuyngu
dv Maldivian
el Greek
emi Mussau-Emira lacks lenition
eu Basque
gaw Nobonob
ghs Guhu-Samane
gil Kiribati lacks lenition
gn Guarani
guc Wayuu
guo Guayabero
gvn Kuku-Yalanji
haw Hawaiian
hil Hiligaynon
hmn Hmong lacks lenition
hsb Upper Sorbian
ht Haitian Creole
hu Hungarian
hy Armenian
ign Ignaciano
ilo Ilocano
inb Inga
iu Inuktitut
iws Sepik Iwam
jam Jamaican Creole
jv Javanese
ka Georgian
kbd Kabardian
kjb Q'anjob'al
kki Kagulu lacks lenition
kl Kalaallisut
kn Kannada
ko Korean
kpf Komba
kpx Mountain Koiali
krc Karachay-Balkar
ksr Borong lacks lenition
kup Kunimaipa
kv Komi lacks lenition
ky Kirghiz
lem Nomaande
mam Mam
mcq Ese
mg Malagasy
mhl Mauwake
mk Macedonian
mqj Mamasa
mto Totontepec Mixe
mva Manam
naf Nabak
nan Min Nan Chinese lacks lenition
nas Naasioi
nhe Nahuatl
nhr Naro
nsn Nehan lacks lenition
nuy Nunggubuyu
omw South Tairora
pad Paumarí lacks lenition
pau Palauan
pio Piapoco
pwg Gapapaiwa
quz Cusco Quechua
rkb Rikbaktsa
ro Romanian
roo Rotokas
shi Shilha
shp Shipibo Konibo
si Sinhala
snc Sinaugoro lacks lenition
sq Albanian
ta Tamil
tac Western Tarahumara
te Telugu
tee Huehuetla Tepehua
tg Tajik
to Tongan lacks lenition
tpi Tok Pisin
tr Turkish
tt Tatar
tyv Tuvan
tzo Tzotzil
ug Uyghur
uk Ukrainian
usa Usarufa
uz Uzbek
var Huarjío
vi Vietnamese
viv Iduna
way Wayana
wbp Warlpiri
wo Wolof lacks lenition
ycn Yucuna
yi Yiddish
yua Yucatec Maya
yuz Yuracare
yva Yawa lacks lenition
zos Francisco León Zoque

Compromised Languages

Language Code Language (click for info) Reason (more thorough explanation in Rmd files) Comments
acr Rabinal Achi' suspect marking of vowel length lacks lenition
ake Akawaio conflation between voiceless and voiced consonants
amp Alamblak conflation between /ɘ/ and /o/
aoj Mufian conflation among vowels; ambiguity regarding vowel length and labialized consonant clusters lacks lenition
ar Arabic ambiguous transcription of alif; conflation between vowels and glides
arn Mapudungun ambiguous orthography; conflation between dental and alveolar consonants
awx Awara conflation between /nd/, /mb/, /nɡ/ and /d/, /b/, /ɡ/, respectively
bcl Central Bikol inconsistent marking of glottal stops lacks lenition
bmu Somba Siawari phonetic alphabet
btx Batak Karo conflation among /e/, /ɘ/, and /ɯ/
bzd Bribri phonetic alphabet; contradicting documentation
bzh Mapos Buang conflation between /ɛ/ and other vowels
ca Catalan conflation among vowels and glides; ambiguous phonological interpretations
cav Cavineña ambiguity whether a digraph represents one phoneme or two, depending on syllable structure lacks lenition
chf Tabasco Chontal conflation between ejectives and stop-glottal stop sequences
chm Mari conflation with some palatalized and non-palatalized consonants; some vowels not always represented orthographically lacks lenition
cho Choctaw phonetic alphabet
cni Asháninka conflation among nasals
cof Colorado orthographic ambiguity with glottal stops
con Cofan conflation between consonants
crm Moose Cree /h/ represented only when contrast is required lacks lenition
dyo Jola-Fogny uncertainty around the marking of +ATR vowels lacks lenition
es Spanish non-transparent transcription of diphthongs
fuv Nigerian Fulfulde inconsistent marking of glottal stops; unclear transcription of palatalized glottal stop
hi Hindi conflation between /æ/ and /ɛ/; vowel nasalization ambiguity; unreliable marking of some consonants
id Indonesian conflation between /e/ and /ə/
ixl Ixil word-initial glottal stop not always marked; somewhat ambiguous orthography
kea Cape Verdean Creole possible conflation between /a/ and /ɐ/, /e/ and /ɛ/, and /ɾ/ and /ʀ/ lacks lenition
kek Qeqchi ambiguity between ejective stops and stop-glottal stop sequences
kk Kazakh conflation between vowels and glides; widely contradicting phonological accounts of the language
kmo Kwoma non-transparent transcription of glottal stops
kyz Kayabí conflation between /i/ and /j/ lacks lenition
mcf Matsés conflation between alveolar and retroflex consonants; conflation between vowels
mek Mekeo non-transparent transcription of glottal stops
mfe Morisyen highly suspect orthography; conflation among consonants
ml Malayalam conflation between dental and alveolar /n/
mlp Bargam conflation between /n/ and /ŋ/ lacks lenition
mnb Muna suspect orthography
mpx Misima-Panaeati conflation between /e/ and /ɛ/ and between /o/ and /ɔ/ lacks lenition
mt Maltese conflation between /ts/ and /dz/ and between /ʃ/ and /ʒ/
myv Erzya conflation between /n/ and /ŋ/ lacks lenition
ne Nepali certain diacritics used interchangeably and inconsistently marked
not Nomatsiguenga conflation among nasals
or Oriya certain diacritics used interchangeably and inconsistently marked
os Ossetic conflation among /u/, /w/, and /ʷ/; inconsistent marking of consonant gemination
pag Pangasinan possible conflation between /ŋ/ and /nɡ/
pib Yine conflation between /n/ and /h̃/ lacks lenition
plu Palikúr conflation between /ɡ/ and /ɣ/
qub Huallaga Huanuco Quechua suspect orthography; conflation between vowels and glides
rwo Rawa conflation between /l/ and /r/
sah Yakut conflation between /j/ and /j̃/
sk Slovak non-transparent transcription of palatal consonants; ambiguity whether digraphs represent one phoneme or two
sm Samoan marking of long vowels and glottal stops is suspect
suz Sunwar conflation between /ɾ/, /ɭ/, and possibly /l̪/; inconsistent marking of glottal stops
sw Swahili conflation between syllabic nasals and non-syllabic counterparts
too Xicotepec de Juárez Totonac suspect transcription due to unclear documentation
tpp Pisaflores Tepehua suspect marking of vowel length
tzj Tz'utujil uncertainty around the marking of the glottal stop and the orthography
tzm Central Atlas Tamazight conflation between /l̪/ and /l̪ˤ/, and between /ʒ/ and /ʒˀ/
wmw Mwani conflation between syllabic nasals and prenasalized stops lacks lenition
zsm Standard Malay conflation between /e/ and /ə/; conflicting orthographies
zza Zaza conflicting orthographies; conflation among vowels

Abandoned Languages

Language Code Language Reason
ace Acehnese non-transparent transcription of vowel nasalization
ach Acholi non-transparent transcription of tones
acu Achuar-Shiwiar non-transparent transcription of vowel nasalization
adh Adhola non-transparent transcription of tones
af Afrikaans non-transparent transcription of vowels, vowel length, and diphthongs
agd Agarabi non-transparent transcription of tones
agm Angaataha non-transparent transcription of tones
agr Aguaruna non-transparent transcription of vowel nasalization
ak Akan non-transparent transcription of tones
alq Algonquin non-transparent transcription of vowel length
am Amharic non-transparent transcription of consonant gemination
anv Denya non-transparent transcription of tones
as Assamese non-transparent transcription of vowels
aso Dano non-transparent transcription of tones
avt Avar non-transparent transcription of consonant gemination
ban Bali non-standardized orthography
bem Bemba non-transparent transcription of tones
bba Bariba non-transparent transcription of tones
bcw Bana non-transparent transcription of tones
bhl Bimin non-transparent transcription of tones
bm Bambara non-transparent transcription of tones
bmr Muinane non-transparent transcription of tones
bs Bosnian non-transparent transcription of vowel length and tones
bsn Barasana-Eduria non-transparent transcription of tones
bua Buryat non-transparent transcription of palatalization
byr Baruya non-transparent transcription of tones
cao Chácobo non-transparent transcription of tones
cax Chiquitano non-transparent transcription of vowel nasalization
cbc Carapan non-transparent transcription of tones
ce Chechen non-transparent transcription of vowel length
ceb Cebuano non-transparent transcription of vowel length
chr Cherokee non-transparent transcription of vowel length
cwk Western Kaqchikel non-transparent transcription of vowels
cnh Haka Chin non-transparent transcription of tones
coe Koreguaja non-transparent transcription of tones
ctd Tedim Chin non-transparent transcription of tones
cub Cubeo non-transparent transcription of tones
cuk San Blas Kuna non-transparent transcription
cy Welsh non-transparent transcription of vowel length
da Danish non-transparent transcription of vowels
daa Dangaléat non-transparent transcription of tones
des Desano non-transparent transcription of tones
dgo Dogri non-transparent transcription of tones
din Dinka non-transparent transcription of tones
dts Toro So Dogon non-transparent transcription of tones
dz Dzongkha non-transparent transcription
ee Ewe non-transparent transcription of tones
efi Efik non-transparent transcription of tones
emp Northern Emberá non-transparent transcription
enb Markweeta non-transparent transcription of tones
enq Enga non-transparent transcription of tones
et Estonian non-transparent transcription of contrastive syllable length
faa Fasu non-transparent transcription of tones
fi Finnish non-transparent transcription
fj Fijian non-transparent transcription of vowel length
fo Faroese non-transparent transcription of vowels
for Fore non-transparent transcription of tones
fur Friulian non-transparent transcription of vowels
fy Frisian non-transparent transcription of vowels
ga Irish non-transparent transcription
gah Alekano non-transparent transcription of tones
gd Scottish Gaelic non-transparent transcription of consonants and vowels
gl Galician non-transparent transcription
gmo Gamo-Gofa-Dawro three languages understood to be linguistically separate
grb Grebo non-transparent transcription of tones
grt Garo non-transparent transcription of vowels
gub Guajajara non-transparent transcription of vowel nasalization
gum Guambiano non-standardized orthography
gur Farefare non-transparent transcription of tones
gv Manx Gaelic non-transparent transcription of consonants and vowels
ha Hausa non-transparent transcription of vowel length
hbs Serbo-Croatian non-transparent transcription of tones
hch Huichol non-transparent transcription of tones
heh Hehe non-transparent transcription of tones
hr Croatian non-transparent transcription of vowel length
hub Huambisa non-transparent transcription of vowel nasalization
hui Huli non-transparent transcription of tones
huv Huave inconsistent phonological documentation
hz Herero non-transparent transcription of tones
ig Igbo non-transparent transcription of tones
ik Inupiaq insufficient tokens
is Icelandic non-transparent transcription of vowel length
jiv Shuar non-transparent transcription of vowel nasalization
kab Kabyle non-transparent transcription of consonants
kac Jingpho non-transparent transcription of tones
kaq Capanahua non-transparent transcription of tones
kbc Kadiweu non-transparent transcription of consonant gemination
kbr Kafa non-transparent transcription of tones
kha Khasi non-transparent transcription of vowel length
khk Khalkha Mongolian non-transparent transcription of vowels
ki Gikuyu non-transparent transcription of tones
kj Kwanyama non-transparent transcription of tones
kjs East Kewa non-transparent transcription of tones
kew West Kewa non-transparent transcription of tones
kmr Northern Kurdish non-transparent transcription of consonants
kmu Kanite non-transparent transcription of tones
ksd Kuanua non-transparent transcription of vowel length
kus Kusaal non-transparent transcription of tones and vowel length
kw Cornish non-transparent transcription of vowel length
lac Lacandon non-transparent transcription of vowel length
lb Luxembourgish non-transparent transcription of vowels
lef Lelemi non-transparent transcription of tones
lg Luganda non-transparent transcription of tones
ln Lingala non-transparent transcription of tones
loz Lozi non-transparent transcription of tones
lt Lithuanian non-transparent transcription of tones
luo Dholuo non-transparent transcription of tones
lus Mizo non-transparent transcription of tones
lv Latvian non-transparent transcription of tones
lvs Standard Latvian non-transparent transcription of tones
lwo Luwo non-transparent transcription of tones and breathy vowels
man Mandingo non-transparent transcription of tones
mas Maasai insufficient tokens
mcb Machiguenga non-transparent transcription of tones
mcd Sharanahua non-transparent transcription of tones
meu Motu non-transparent transcription of vowel length
mfi Wandala non-transparent transcription of tones
mfz Mabaan non-transparent transcription of tones
mhr Eastern Mari non-transparent transcription of palatalization
mi Maori non-transparent transcription of vowel length
miq Miskito non-transparent transcription of vowel nasalization and length
mni Meitei non-transparent transcription of tones
mos Mossi non-transparent transcription of tones
mps Dadibi non-transparent transcription of tones and vowel nasalization
mpt Mian non-transparent transcription of tones
ms Malay non-transparent transcription of vowels
my Burmese non-transparent transcription of tones
myu Mundurukú non-transparent transcription of tones and creaky vowels
myy Macuna non-transparent transcription of tones
nd Northern Ndebele insufficient tokens
nds Low Saxon non-transparent transcription
nfr Nafaanra non-transparent transcription of tones
nhg Tetelcingo Nahuatl non-transparent transcription of vowel length
no Norwegian non-transparent transcription of tones and vowel length
ntp Northern Tepehuan non-transparent transcription of tones
nv Navajo non-transparent transcription of vowel nasalization
ny Chichewa non-transparent transcription of tones
nyn Nyankore non-transparent transcription of tones
om Oromo non-transparent transcription of tones
opm Oksapmin non-transparent transcription of vowels
ood Tohono O'odham non-transparent transcription
ots Estado de México Otomi non-transparent transcription of tones
pab Parecís non-transparent transcription of vowel length and nasalization
pao Northern Paiute non-transparent transcription of vowel length
pap Papiamentu non-transparent transcription of vowels
pir Wanano non-transparent transcription of tones
pl Polish non-transparent transcription
pms Piedmontese non-transparent transcription
poh Poqomchi' insufficient documentation
rw Kinyarwanda non-transparent transcription of tones and vowel length
sd Sindhi non-transparent transcription of vowels
se Northern Sami non-transparent transcription
sg Sango non-transparent transcription of tones
sim Mende non-transparent transcription of tones
sll Salt-Yui non-transparent transcription of tones
sn Shona non-transparent transcription of tones
so Somali non-transparent transcription of tones
soq Kanasi non-transparent transcription of glottal stops
spp Supyire Senoufo non-transparent transcription of tones
ss Swati non-transparent transcription of tones
st Sesotho non-transparent transcription of tones
sv Swedish non-transparent transcription
swp Suau non-transparent transcription
sxb Suba non-transparent transcription of tones
tav Tatuyo non-transparent transcription of tones
tcc Datooga non-transparent transcription of tones
tcy Tulu non-transparent transcription of vowels
tcz Thadou Chin non-transparent transcription of tones
ti Tigrinya non-transparent transcription of gemination
tk Turkmen non-transparent transcription of vowel length
tl Tagalog non-transparent spalling of vowel length
tn Tswana non-transparent transcription of tones
toi Tonga non-transparent transcription of tones
trp Kok Borok non-transparent transcription of tones
ts Tsonga non-transparent transcription of tones
ttc Tekiteko non-transparent transcription of vowel length
tuf Central Tunebo non-transparent transcription of contrastive features (first syllable)
tw Twi non-transparent transcription of tones
ubu Umbu-Ungu non-transparent transcription of tones
udu Uduk non-transparent transcription of tones
ur Urdu non-transparent transcription of vowels
ura Urarina non-transparent transcription of tones
usp Uspanteko non-transparent transcription of tones
ve Venda non-transparent transcription of tones
vro Võro non-transparent transcription of vowels and palatalization
wa Walloon non-transparent transcription
wal Wolaytta non-transparent transcription of tones
war Waray-Waray insufficient documentation
wiu Wiru non-transparent transcription of tones
xal Kalmyk-Oirat non-transparent transcription of vowels
xav Xavánte non-transparent transcription of vowel length
xbi Kombio non-transparent transcription of vowels
xh Xhosa non-transparent transcription of tones
xla Kamula non-transparent transcription of vowels and tones
xsr Sherpa insufficient documentation
yaa Yaminahua non-transparent transcription of tones
yad Yagua non-transparent transcription of tones
yby Yaweyuha non-transparent transcription of tones
yo Yoruba non-transparent transcription of tones
zai Zapotec non-transparent transcription of tones
zca Coatecas Altas Zapotec non-transparent transcription of tones
zpi Santa María Quiegolani Zapotec non-transparent transcription of tones
zpq Zoogocho Zapotec non-transparent transcription of tones
zu Zulu non-transparent transcription of tones