concepticon / concepticon-data

The curation repository for the data behind Concepticon.
https://concepticon.clld.org
35 stars 37 forks source link

Urum basic lexicon #62

Open LinguList opened 8 years ago

LinguList commented 8 years ago

http://urum.lili.uni-bielefeld.de/download/docs/uum-lexicon.pdf

This list draws from WOLD, adds 90 more concepts, and provides alternative categories. It is long, and it is a PDF, so now way to quickly extract a linking to the concepticon. The semantic categories would be interesting, though, but this is probably rather a long-term than a short-term list-to-map.

xrotwang commented 8 years ago

Maybe this is of interest for the dictionaria project

AnnikaTjuka commented 4 years ago

@LinguList and @xrotwang Should we keep the basic lexicon as an issue in Concepticon or should I open a new issue on the Dictionaria GitHub page?

Btw. the link didn't work anymore, but I found this one: http://projects.turkmas.uoa.gr/urum/download/docs/uum-lexicon.pdf

LinguList commented 4 years ago

This is a dataset for one language in lexibank, or even more than one, given the glossing languages. One would need to see to which degree the concept list can be extracted from the data (using some tools like adobe pro). One may also think of contacting the authors, if they are interested in sharing the concept list in form of an excel sheet.

LinguList commented 4 years ago

I think it may also be interesting for @ilchec. Maybe he even knows the authors. And yes, @xrotwang, when asking them if they want to publish through dictionaria, they might be interested.

LinguList commented 4 years ago

But we should then ask them directly, maybe now?

ilchec commented 4 years ago

It looks like the PDF that you've linked can be parsed relatively easily, the entries are all organized similarly (there are no optional notes) and each piece of information is preceded by the keyword, so it won't be that hard with PDFMiner. And unfortunately I don't know the authors =(

LinguList commented 4 years ago

So we can already prepare the data with adobe pro (this is working even better), I think @MacyL has it, otherwise I'll ask Nathan, and then we have the concept list, which is anyway nice. In the meantime, we ask the authors if they are interested in submitting their data to dictionaria?