dsindex / syntaxnet

reference code for syntaxnet
197 stars 57 forks source link

How to train Chinese corpus after downloading the universal-dependencies-2.0 ? #24

Closed 12343954 closed 6 years ago

12343954 commented 7 years ago

How to train Chinese corpus after downloading the universal-dependencies-2.0 ?

thank you so much !

I couldn't find the method.

dsindex commented 7 years ago

hello~

since the Syntaxnet recommends to use DRAGNN,

i also recommend you to follow this tutorial.

after downloading UD Chinese corpus, you can use train_dragnn.sh to train a model. (please edit train_dragnn.sh for proper corpus path)

you can use test_dragnn.sh for tagging and parsing.

these model is supposed to get a pre-segmented sentence as input. but, in Chinese, there is no space in input sentence.

therefore, you should segment raw sentence(unsegmented) before feeding the model.

if you want to train a segmentation model, you can refer codes

1. segmenter_trainer.py
2. segmenter-evaluator.py
12343954 commented 7 years ago

hi, thank for your reply ! I follow this step, https://github.com/dsindex/syntaxnet/issues/21 and I compile the first line , then error for this ERROR: /Users/mac/NLP/models/syntaxnet/work/dragnn_examples/BUILD:62:1: no such package 'dragnn/protos': BUILD file not found on package path and referenced by '//work/dragnn_examples:dragnn-deps'. ERROR: Analysis of target '//work/dragnn_examples:write_master_spec' failed; build aborted.

qq 20170421120209

dsindex commented 7 years ago

@12343954 did you install the most recent version of the Syntaxnet? your directory should look like

$ cd /Users/mac/NLP/models/syntaxnet
$ ls
... g3doc/   third_party/  util/ README.md   dragnn/  syntaxnet/  tools/  work/
WORKSPACE  examples/   tensorflow/   ....
12343954 commented 7 years ago

print(tf.version) 1.0.1

12343954 commented 7 years ago

image it seemed not the latest version

12343954 commented 7 years ago

@dsindex how to upgrade to the latest version?TF and syntaxnet. Uninstall and reinstall again?

dsindex commented 7 years ago

you need to install syntaxnet and dragnn from source by following guide. https://github.com/tensorflow/models/blob/master/syntaxnet/README.md once you install it, every code for syntaxnet and dragnn will use the tensorflow submodule inside regardless tf module which is installed system wide.

12343954 commented 7 years ago

from source ? oh, it's little difficult for me! thank you very much , I'll try .