cdli-gh / Sumerian-Translation-Pipeline

UrIII Period (Sumerian Language) Information Extraction pipeline including, Named Entity Recognition, Part Of Speech Tagging and Machine Translation
MIT License
25 stars 7 forks source link

POS and NER #2

Open chiarcos opened 4 years ago

chiarcos commented 4 years ago

Hi, as the repo contains the newest MTAAC/CDLI POS and NER modules, it would be good to provide them as standalone tools, too. Also, has there been any direct evaluation of NER+POS tagging or just of the MT pipeline as a whole? If so, it would be good to share that, too. Finally, the repo contains lists that are instrumental to NER and POS tasks, like Sumerian versions of month names (the royal and place names are English-side only as far as I can tell), and (I guess, but I didn't find these) year names. It would be good to document the respective locations along with the distribution of the NER/POS tagger.

I mark this as a documentation issue because the code is already there ...

Thanks a lot, Christian

himanshudce commented 4 years ago

Hi Christian, Thanks for the comments. There are direct as well as human evaluation for both POS and NER, as we are currently working on a paper, so I will add those results along with citations once the paper is in public. Basically this repo contains mainly the info/tools for POS and NER, and we have a different repo for Machine Translation. Also, I guess the month year names, etc, have seprate documentation in CDLI/MTAAC, Although I am not a domain expert so can't comment much on that, I will add that after discussing it with the Team. Thank You