X-Sampa to CLTS - Githubissues

cldf-datasets / doreco

CLDF dataset derived from DoReCo's core corpus

https://doreco.info/

3 stars 0 forks source link

X-Sampa to CLTS #3

Closed FredericBlum closed 1 year ago

FredericBlum commented 2 years ago

For the transcription, all phones are currently in X-Sampa and need to be transfered to CLTS.

FredericBlum commented 2 years ago

@Lingulist You mentioned something about extracting concordances for this conversion, but I am not sure what you are referring to. Could you elaborate briefly?

LinguList commented 2 years ago

I recommend reading our paper, List, Sims, Forkel 2020 on IGT for this purpose, where we mention this (Robert has developed tge package further by now).

xrotwang commented 2 years ago

pyigt can be used to extract word/morpheme concordances, not phoneme concordances. So I don't think it's relevant for X-Sampa to CLTS conversion.

LinguList commented 2 years ago

It depends on the corpus structure, I thought, we first get a concordance of words and then convert those to clts/bipa, with the typical orthoprofile procedure from pylexibank. Here, you woukd use a concordance to get those lexemes, right?

LinguList commented 2 years ago

But if that is not the case, one needs to use segments directly, which changes the procedure of applying the profile.

xrotwang commented 2 years ago

Ah, ok. Yes, one could do that - although I wouldn't want to bring in all the pylexibank machinery in this repos. So maybe we should

create these concordances
put them in a doreco-lexibank repos
work on orthography profiles there

Then, copy the profiles back here and add the CLTS conversion to the makecldf command.

LinguList commented 2 years ago

Yes, sounds like a plan. We have a rather complete sampa profile. Need to look that up when I find time. It may be in the orthograpy repo...

LinguList commented 2 years ago

It is https://github.com/orthograpy/orthograpy, if I am not mistaken.

xrotwang commented 1 year ago

See https://github.com/cldf-datasets/doreco/blob/main/etc/orthography.tsv