Open LinguList opened 8 years ago
Maybe this is of interest for the dictionaria project
@LinguList and @xrotwang Should we keep the basic lexicon as an issue in Concepticon or should I open a new issue on the Dictionaria GitHub page?
Btw. the link didn't work anymore, but I found this one: http://projects.turkmas.uoa.gr/urum/download/docs/uum-lexicon.pdf
This is a dataset for one language in lexibank, or even more than one, given the glossing languages. One would need to see to which degree the concept list can be extracted from the data (using some tools like adobe pro). One may also think of contacting the authors, if they are interested in sharing the concept list in form of an excel sheet.
I think it may also be interesting for @ilchec. Maybe he even knows the authors. And yes, @xrotwang, when asking them if they want to publish through dictionaria, they might be interested.
But we should then ask them directly, maybe now?
It looks like the PDF that you've linked can be parsed relatively easily, the entries are all organized similarly (there are no optional notes) and each piece of information is preceded by the keyword, so it won't be that hard with PDFMiner. And unfortunately I don't know the authors =(
So we can already prepare the data with adobe pro (this is working even better), I think @MacyL has it, otherwise I'll ask Nathan, and then we have the concept list, which is anyway nice. In the meantime, we ask the authors if they are interested in submitting their data to dictionaria?
http://urum.lili.uni-bielefeld.de/download/docs/uum-lexicon.pdf
This list draws from WOLD, adds 90 more concepts, and provides alternative categories. It is long, and it is a PDF, so now way to quickly extract a linking to the concepticon. The semantic categories would be interesting, though, but this is probably rather a long-term than a short-term list-to-map.