Closed prakhar21 closed 8 years ago
what about trying :
cd models/syntaxnet
./tagger.sh
Got that working. But it seems it is not able to find the directory or file now.
./tagger.sh: line 6: bazel-bin/syntaxnet/parser_trainer: No such file or directory ./tagger.sh: line 8: --compute_lexicon: command not found ./tagger.sh: line 9: --graph_builder=greedy: command not found ./tagger.sh: line 10: --training_corpus=training-corpus: command not found ./tagger.sh: line 12: --tuning_corpus=tuning-corpus: command not found ./tagger.sh: line 13: --batch_size=32: command not found ./tagger.sh: line 15: --decay_steps=3600: command not found
But I have checked, bazel-bin/syntaxnet/parser_trainer
exists.
Where might the problem be?
cd models/syntaxnet
bazel-bin/syntaxnet/parser_trainer \
--task_context=syntaxnet/context.pbtxt \
--arg_prefix=brain_pos \
--compute_lexicon \
--graph_builder=greedy \
--training_corpus=training-corpus \
--tuning_corpus=tuning-corpus \
--output_path=models \
--batch_size=32 \
--decay_steps=3600 \
--hidden_layer_sizes=128 \
--learning_rate=0.08 \
--momentum=0.9 \
--seed=0 \
--params=128-0.08-3600-0.9-0
there should be no space or character after '\'
Resolved some of the error by removing comments after '/' but still
./MyPosTagger.sh: line 8: bazel-bin/syntaxnet/parser_trainer: No such file or directory
I have checked, and parser_trainer exists on this path only.
Got that thing resolved. Directory structure was at fault some how. I have started the training, but the problem is that after evaluation eval metric is 0.02%.
tensorfow:Seconds elapsed in evaluation: 1.04, eval metric: 0.02%
did you use https://github.com/dsindex/syntaxnet/blob/master/convert.py for replacing upos to xpos?
yes, i saw that script. But I modified it at this step to tokens[4] = tokens[3] instead of tokens[3]=tokens[4] If we refer this: http://universaldependencies.org/format.html . We can see that index-3 is UPOS and index-4 is XPOS. refered this https://github.com/UniversalDependencies/docs/issues/297
It seems, I have successfully trained the POS model by using 1st two functions from your train script. I am confused to how do i get I test my pos tagger on some terminal based example like we can do for already existing demo.sh
you can use test.sh for that purpose~ :)
# train.sh
convert
train_pos_tagger
copy_model
(skip cp -rf ${TMP_DIR}/brain_parser/structured/${GP_PARAMS}/model ${MODEL_DIR}/parser-params )
# test.sh
PARSER_EVAL=${BINDIR}/parser_eval
CONLL2TREE=${BINDIR}/conll2tree
MODEL_DIR=${CDIR}/models
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin
${PARSER_EVAL} \
--input=${INPUT_FORMAT} \
--output=stdout-conll \
--hidden_layer_sizes=128 \
--arg_prefix=brain_pos \
--graph_builder=greedy \
--task_context=${MODEL_DIR}/context.pbtxt \
--model_path=${MODEL_DIR}/tagger-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
$ echo "hello syntaxnet" | ./test.sh
...
I syntaxnet/term_frequency_map.cc:101] Loaded 47 terms from work/models/label-map.
I syntaxnet/embedding_feature_extractor.cc:35] Features: stack(3).word stack(2).word stack(1).word stack.word input.word input(1).word input(2).word input(3).word;input.digit input.hyphen;stack.suffix(length=2) input.suffix(length=2) input(1).suffix(length=2);stack.prefix(length=2) input.prefix(length=2) input(1).prefix(length=2)
I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: words;other;suffix;prefix
I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 64;4;8;8
I syntaxnet/term_frequency_map.cc:101] Loaded 18731 terms from work/models/word-map.
I syntaxnet/term_frequency_map.cc:101] Loaded 50 terms from work/models/tag-map.
INFO:tensorflow:Building training network with parameters: feature_sizes: [8 2 3 3] domain_sizes: [18734 5 4195 5342]
I syntaxnet/embedding_feature_extractor.cc:35] Features: stack(3).word stack(2).word stack(1).word stack.word input.word input(1).word input(2).word input(3).word;input.digit input.hyphen;stack.suffix(length=2) input.suffix(length=2) input(1).suffix(length=2);stack.prefix(length=2) input.prefix(length=2) input(1).prefix(length=2)
I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: words;other;suffix;prefix
I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 64;4;8;8
I syntaxnet/term_frequency_map.cc:101] Loaded 18731 terms from work/models/word-map.
I syntaxnet/term_frequency_map.cc:101] Loaded 50 terms from work/models/tag-map.
I syntaxnet/term_frequency_map.cc:101] Loaded 47 terms from work/models/label-map.
I syntaxnet/reader_ops.cc:141] Starting epoch 1
I syntaxnet/reader_ops.cc:141] Starting epoch 2
INFO:tensorflow:Processed 1 documents
1 hello _ DT DT _ 0 _ _
2 syntaxnet _ ADD ADD _ 0 _ _
INFO:tensorflow:Total processed documents: 1
INFO:tensorflow:num correct tokens: 0
INFO:tensorflow:total tokens: 2
INFO:tensorflow:Seconds elapsed in evaluation: 0.03, eval metric: 0.00%
$ echo "i want to study" | ./test.sh
...
1 i _ PRP PRP _ 0 _ _
2 want _ VBP VBP _ 0 _ _
3 to _ TO TO _ 0 _ _
4 study _ VB VB _ 0 _ _
...
@dsindex Thanks ! Got that working.
I want to train syntaxnet model with custom corpora ( let's say EN from universal dependencies ).
I have done following changes:
input { name: 'training-corpus' record_format: 'conll-sentence' Part { file_pattern: '/home/prakhar/Downloads/ud-treebanks-v1.3/UD_English/en-ud-train.conllu' } } input { name: 'tuning-corpus' record_format: 'conll-sentence' Part { file_pattern: '/home/prakhar/Downloads/ud-treebanks-v1.3/UD_English/en-ud-dev.conllu' } } input { name: 'dev-corpus' record_format: 'conll-sentence' Part { file_pattern: '/home/prakhar/Downloads/ud-treebanks-v1.3/UD_English/en-ud-test.conllu' } }
Also, I have made a file with the name tagger.sh and copied below contents to itbazel-bin/syntaxnet/parser_trainer \ --task_context=syntaxnet/context.pbtxt \ --arg_prefix=brain_pos \ # read from POS configuration --compute_lexicon \ # required for first stage of pipeline --graph_builder=greedy \ # no beam search --training_corpus=training-corpus \ # names of training/tuning set --tuning_corpus=tuning-corpus \ --output_path=models \ # where to save new resources --batch_size=32 \ # Hyper-parameters --decay_steps=3600 \ --hidden_layer_sizes=128 \ --learning_rate=0.08 \ --momentum=0.9 \ --seed=0 \ --params=128-0.08-3600-0.9-0 # name for these parameters
How do I train it now ? I have tried
./tagger.sh
but says Permission denied. Do I need to change anything else anywhere ?