Open LinguList opened 3 years ago
@LinguList I just stumbled over this by chance... Can I help with anything?
Ah, its the author, very nice you are on GitHub!
So I checked your Kalamang data on IDS, and added an orthography profile, which basically converts your orthography to normalized IPA, which is useful for standardization. I found three cases (probably loans?) where the data has more than just two vowels (which your grammar says would not occur, if I read it right?).
In any case. Since you have a dictionaria dataset and an IDS dataset, it would be easiest to combine both datasets, so one has a link from one to the other.
But our experiments so far left us a bit helpless, since we could not identify a direct link between the IDS dataset and the Dictionaria dataset (by comparing word forms) in all cases, and I'd also assume, after inspecting the IDS translations for individual concepts, that the IDS meaning descriptions are so loose that they unfortunately force authors to provide a lot of synonyms for concepts which we have much more clearly specified in the Concepticon project.
One direct question here would be: did you have links from your dictionary to the IDS dataset when you prepared the latter?
You can btw also just write by email, and we can include the dictionaria editors.
I am not sure who the Dictionaria editors are (my contact was Iren Hartmann, but she quit a while ago), so I continue here.
No, I did not have links from the dictionary to the IDS dataset. I asked the editors about that, but unfortunately it wasn't possible. I remember I got an excel file with some data pulled from LexiRumah, though.
The 2+ vowel sequences are either sloppy orthography on my side, or multimorphemic words.
It was Robert Forkel who made that excel file.
I checked the IDS data and the orthography profile is easy to create from the phonetics description in the grammar. I will propose a PR to IDS for this resource, so one could probably also add the profile here. Furthermore, as IDS is concepticon-mapped, the Concepticon links should maybe be fed from that IDS dataset? Or is this already done?