cdli-gh / mtaac_work

MTAAC work packages
https://cdli-gh.github.io/mtaac/
10 stars 3 forks source link

Glossary and sign lists #7

Open epageperron opened 7 years ago

epageperron commented 7 years ago

As part of the lemmatization process and for other tasks, we need to prepare fully consolidated and curated lists to work with. Data must be extracted from different sources, verified, consolidated and prepared in the ideal format to work with. The focus should be on the Ur III data but when we have more data and we prepare an automated extraction system then all data should be gathered to facilitate further research.

All relevant files should be in the "pre-processing" folder:

epageperron commented 7 years ago

Updated to add the list of alternating verbal bases. The file I have is based on the OCR of Thomsen and had been cleaned to get verb - definition so one would have to go over the original file and extract verbs not marked as "regular". Also the ePSD has alternative spellings for words to that might be another source.

epageperron commented 5 years ago

@khoit do you need this for your work? If not, we can maybe postpone to at least after Christmas as a lot of work is left on framrwork? I'd love to work on this