Train sense2vec in Chinese

explosion / sense2vec

🦆 Contextually-keyed word vectors

https://explosion.ai/blog/sense2vec-reloaded

MIT License

1.62k stars 240 forks source link

Train sense2vec in Chinese #148

Open JingxinLee opened 2 years ago

JingxinLee commented 2 years ago

Try to use Wikipedia Chinese corpus to Train sense2vec. But met a problem which is The 'noun_chunks' syntax iterator is not implemented for language 'zh'. Anyone know how to deal with this? How could I write the lables in noun_chunks function? How can I find the labels I need?

JingxinLee commented 2 years ago

This problem is start from doc = merge_phrases(doc)， end in https://github.com/explosion/sense2vec/blob/d689bb65ce0f6c597c891cea3ba279ad1f92916f/sense2vec/util.py#L117

I mannully create a syntax_iterators.py within zh. But it doesn't work.