Closed wannaphong closed 1 year ago
@wannaphong do you currently working on this?, if not, i can help you train the POS tagging model using transformer with listed component here, and then I will ask for your review after it is finished to see what we can improved further. What do you think?
@wannaphong do you currently working on this?, if not, i can help you train the POS tagging model using transformer with listed component here, and then I will ask for your review after it is finished to see what we can improved further. What do you think?
@MpolaarbearM is doing train model for Blackboard Treebank. You can do Orchid Corpus or UD Thai PUD.
Blackboard Treebank model by bert: https://huggingface.co/lunarlist/pos_thai
@wannaphong do you currently working on this?, if not, i can help you train the POS tagging model using transformer with listed component here, and then I will ask for your review after it is finished to see what we can improved further. What do you think?
@MpolaarbearM is doing train model for Blackboard Treebank. You can do Orchid Corpus or UD Thai PUD.
Blackboard Treebank model by bert: https://huggingface.co/lunarlist/pos_thai
thanks!, i will go for UD Thai PUD corpus and inform you when the model is finished.
@wannaphong i've already done training transformers on Thai Part-of-Speech corpus. As for discussion, the model were trained on UD Thai PUD
corpus on Universal POS (UPOS) tag. All models are ported to Huggingface Hub already where list of my trained models are as follows:
WangchanBERTa
: one existing language model for thai language as you stated in the model list within this issue that you want the corpus to be trained on, the training results already reported in https://huggingface.co/Pavarissy/wangchanberta-ud-thai-pud-upos
DeBERTaV3
: As of March 2023, DeBERTaV3 bring an impressive state-of-the-art performance on the NLU task benchmark compared to another models. I put this into your considerations since the performance of its multilingual version (mDeBERTaV3) which is Thai-supported achieved a better score on UD Thai PUD corpus as well. You can check its training results in https://huggingface.co/Pavarissy/mdeberta-v3-ud-thai-pud-upos
ps. both models are trained on a specified corpus, any improvement of them can be discussed from now on. Since it is public model on huggingface hub, if you want to integrate into PyThaiNLP, i can help you with it.
what do you think ?
@pavaris-pm Hi! I can integrate your model into the new pos tagging function since I currently working on mine and nearly finished it.
@pavaris-pm Hi! I can integrate your model into the new pos tagging function since I currently working on mine and nearly finished it.
@MpolaarbearM Great to hear that! However, I have trained 2 pos tagging models. Which model will be integrated in ? Do we need any consideration from @wannaphong ?
We can wait for approval. But the methods of integration are the same, so I'll do both of them for now.
Thanks for your help, after approval, please inform me when it is integrated 👍🏻
Today, PyThaiNLP use perceptron tagger. It still give the best score from Blackboard treebank Test set (https://pythainlp.github.io/Model-Cards/Part%20of%20speech/#blackboard-perceptron) but most people want to use with transformers.
I think It is good if Part-of-speech tagging use transformers model.
List Model:
Docs: https://huggingface.co/learn/nlp-course/chapter7/2?fw=tf
List corpus: