Can you provide the Japanese training example?

Hironsan / anago

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

https://anago.herokuapp.com/

MIT License

1.48k stars 368 forks source link

Can you provide the Japanese training example? #21

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hi, I'm wondering what does the Japanese training data look like. Are they segmented by word or by character? also the data for training word2vec, are they segmented in the same way?

Hironsan commented 6 years ago

In the same way of English dataset, Japanese training data are also segmented by word.

https://github.com/Hironsan/anago/tree/master/data/conll2003/en/ner

I used pre-trained word embeddings.

https://qiita.com/Hironsan/items/513b9f93752ecee9e670