Closed vyaskaustubh closed 2 years ago
UD_Cantonese-HK has only 13918 tokens, so according to the UD data-split guidelines all the data are released as yue_hk-ud-test.conllu and 10-fold cross-validation is recommended for any experiments.
Oh, thanks a lot. I have never worked on a dataset with less than 20k words so I was unaware of this.
I was trying to compare some POS tagging models for different languages and as Cantonese has a very small class of Pronouns, I thought it might give some interesting results. However, I am unable to find the training and test conllu files. It would be really helpful if someone can guide me to them.