Unable to find train and dev sets

UniversalDependencies / UD_Cantonese-HK

Spoken Cantonese from Hong Kong.

Other

28 stars 1 forks source link

Unable to find train and dev sets #2

Closed vyaskaustubh closed 2 years ago

vyaskaustubh commented 2 years ago

I was trying to compare some POS tagging models for different languages and as Cantonese has a very small class of Pronouns, I thought it might give some interesting results. However, I am unable to find the training and test conllu files. It would be really helpful if someone can guide me to them.

martinpopel commented 2 years ago

UD_Cantonese-HK has only 13918 tokens, so according to the UD data-split guidelines all the data are released as yue_hk-ud-test.conllu and 10-fold cross-validation is recommended for any experiments.

vyaskaustubh commented 2 years ago

Oh, thanks a lot. I have never worked on a dataset with less than 20k words so I was unaware of this.