UniversalDependencies / UD_Chinese-GSD

Other
90 stars 23 forks source link

where can i get Simplified Chinese conllu file? #3

Closed yutaolife closed 7 years ago

yutaolife commented 7 years ago

Dear UD, would you please let me know that where can i get Simplified Chinese conllu file. After i search all of UD zip, i found that all chinese zip is traditional chinese. what i mean that, tradition chinese zip file can use in Simplified Chinese?

ermanh commented 7 years ago

You can use a traditional-to-simplified converter (e.g. mafan module in Python).

In general Chinese data should ideally be kept in traditional characters if the source is already in traditional characters, because automatic conversion is reliable from traditional to simplified, but not from simplified to traditional, since simplified has collapsed a number of characters together that are differentiated in traditional (the mafan module mentioned above also has a simplified-to-traditional converter function, but it's not reliable for this very reason).

yutaolife commented 7 years ago

@ermanh Many thanks. I know how to do this.