concepticon / concepticon-data

The curation repository for the data behind Concepticon.
https://concepticon.clld.org
32 stars 35 forks source link

match concepticon general with STEDT taxononomy? #36

Closed LinguList closed 4 years ago

LinguList commented 8 years ago

this will be tedious, but they have this nice historically informed taxonomy in STEDT:

maybe, having the conceptlicon mapped to it, would be nice.

Problem is the size, and again the question of what to do, if something is just too big, so that we can only take a small part for the concepticon? Is that still the same kind of "concept list", or is it something else? I am thinking of major linkings to

But I have the feeling that we should not call these things "concept lists", and we won't be able to get full coverage for the whole bunch of about 2500 concept sets we have at the moment.

Generaly there are two possibilities:

I think that (b) would be better, also for users to find what they are looking for...

xrotwang commented 8 years ago

Seems simpler to scrape and map than semdom.org

LinguList commented 8 years ago

Yep, but there's still lots of work to sufficiently add the meta-data. One could think of adding things automatically, as I have done for some psychological normdata that I added to the meta-data. If we indicate this, it is not exact, but can be useful for specific studies. Having a good strategy for meta-data mapping is still a problem...

xrotwang commented 7 years ago

This could be a starting point: https://github.com/stedt-project/sss/tree/master/semcats/revision1

LinguList commented 7 years ago

Good to have them in raw format!

Yet, I did not find any examples there. We'd ideally have example words guiding us in mapping (instead of tagging 2500 concepts out of the blue...).

LinguList commented 7 years ago

First check based on downloaded taxonomy shows that automatic coverage (breaking down meta-glosses to single glosses) is about 60% for about 700 glosses in the taxonomy (there is not more, but maybe there will be more if we go down to language-level from proto-gloss level). In my experience, manual refinement may add some more 10%. So we might have 70% covered, which would translate into about 500 concepts tagged for STEDT taxonomy. The advantage of this mapping would be that the concepts are rather "practical" occurring in many concept lists, at least I expect this. But comparing coverage with the whole concepticon shows, of course, that we'll be way maximally at some 25%. Yet apparently this is all that STEDT has to offer as far as proto-glosses are concerned.