Kyubyong / g2pC

g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
Apache License 2.0
235 stars 30 forks source link

what the datasets does you training CRF Model use? #5

Open XqFeng-Josie opened 5 years ago

XqFeng-Josie commented 5 years ago

What the datasets does your training model use, and only uses datasets containing polyphonic words?

Kyubyong commented 5 years ago

I used example sentences of polyphonic entries from some dictionaries.

maozhiqiang commented 5 years ago

@Kyubyong !Can you share the format of training corpus for training CRF models? thank you!

XqFeng-Josie commented 5 years ago

You can reference sklearn-crfsuite Tutorial.@maozhiqiang

maozhiqiang commented 5 years ago

@fengxqinx thanks!

TomClarkson commented 5 years ago

I used example sentences of polyphonic entries from some dictionaries.

Would it be possible to share the polyphonic entries? Thanks :)

980202006 commented 2 years ago

How many polyphonic words are covered by these dictionaries, and can you provide some training data? @Kyubyong