How to use conll2017 baseline ？

dsindex / syntaxnet

reference code for syntaxnet

196 stars 57 forks source link

How to use conll2017 baseline ？ #22

Closed zhou-zh closed 6 years ago

zhou-zh commented 7 years ago

Thanks for your great works！I saw your reply on stackoverflow, i know you have built your own system, i have two problemsa about it:

You trianed your model on English, i also trian once. Official offer different language models for conll2017 baselines，i don't konw which are entries for modify scipt to train differnt language models?
Your eval scipt is well , but their README mentioned the baseline_eval.py can't find , do you know where is it? I am sorry for that my problems maybe not directly related to your models. But those are really important for me，if you know please tell me , thanks very much.

dsindex commented 7 years ago

hello~

i understand that you want to train other language model. if then, you can check this issue.
- https://github.com/dsindex/syntaxnet/issues/21#issuecomment-290894631
- to train, basically, downloading UD corpuses required.
- after downloading, modify train_dragnn.sh for a language and run the script.
```
SRC_CORPUS_DIR=${CDIR}/UD_English
TRAIN_FILE=${DATA_DIR}/en-ud-train.conllu.conv 
DEV_FILE=${DATA_DIR}/en-ud-dev.conllu.conv
```
you can check https://github.com/tensorflow/models/issues/1211#issuecomment-287744105

zhou-zh commented 7 years ago

@dsindex , thanks for your reply ! If i hope change a language to trian, I should just modify the path to the data set for corresponding language ? We do not need to use the different models provided by the CoNLL2017 baselines guide ? I thought that different lanuage models have different word-map.

dsindex commented 7 years ago

@continuesmile yes~ place a corpus to the path and modify script for training your own model.

the models provided by the CoNLL2017 baselines guide were trained by https://github.com/tensorflow/models/tree/master/syntaxnet/dragnn/tools

those script are the original one. mine is modified version for convenience.

jhowliu commented 7 years ago

Hi @dsindex,

Should I train the segmentation by myself ? I trained the model with UD Chinese Corpus, but the UAS, LAS only 68.36%, 58.96%, much worse than baseline. Do you have some hint ?

Thanks again