dsindex / syntaxnet

reference code for syntaxnet
196 stars 57 forks source link

How to retrain existing Syntaxnet model? #37

Open apurvnagvenkar opened 6 years ago

apurvnagvenkar commented 6 years ago

Is there a way to retrain the syntaxnet POS tagger model with new dataset?

dsindex commented 6 years ago

'parser_trainer.py' has '--pretrained_params, --pretrained_params_names' parameters. in documentation, they are used for global training.

https://github.com/tensorflow/models/blob/master/research/syntaxnet/g3doc/syntaxnet-tutorial.md

bazel-bin/syntaxnet/parser_trainer \
  --arg_prefix=brain_parser \
  --batch_size=8 \
  --decay_steps=100 \
  --graph_builder=structured \
  --hidden_layer_sizes=200,200 \
  --learning_rate=0.02 \
  --momentum=0.9 \
  --output_path=models \
  --task_context=models/brain_parser/greedy/$PARAMS/context \
  --seed=0 \
  --training_corpus=projectivized-training-corpus \
  --tuning_corpus=tagged-tuning-corpus \
  --params=200x200-0.02-100-0.9-0 \
  --pretrained_params=models/brain_parser/greedy/$PARAMS/model \
  --pretrained_params_names=\
embedding_matrix_0,embedding_matrix_1,embedding_matrix_2,\
bias_0,weights_0,bias_1,weights_1

but i guess it could be used for retraining 'brain_tagger'. so i modified 'train.sh' for retraining 'brain_tagger' like below :

TAGGER_PARAMS=${TAGGER_HIDDEN_LAYER_PARAMS}-0.08-3600-0.9-0
function train_tagger {
    ${BINDIR}/parser_trainer \
      --task_context=${CONTEXT} \
      --arg_prefix=brain_tagger \
      --compute_lexicon \
      --graph_builder=greedy \
      --training_corpus=training-corpus \
      --tuning_corpus=tuning-corpus \
      --output_path=${TMP_DIR} \
      --batch_size=${BATCH_SIZE} \
      --decay_steps=3600 \
      --hidden_layer_sizes=${TAGGER_HIDDEN_LAYER_SIZES} \
      --learning_rate=0.08 \
      --momentum=0.9 \
      --beam_size=1 \
      --seed=0 \
      --params=${TAGGER_PARAMS} \
      --num_epochs=12 \
      --report_every=100 \
      --checkpoint_every=1000 \
      --pretrained_params=${TMP_DIR}/brain_tagger/greedy/${TAGGER_PARAMS}/model \
      --pretrained_params_names=embedding_matrix_0,embedding_matrix_1,embedding_matrix_2,bias_0,weights_0,bias_1,weights_1 \
      --logtostderr
}

and ran again, as you see, 'eval metric' is already 91.04% for epoch 1'

$ ./train.sh -v -v
2018-07-30 21:53:52.409090: I syntaxnet/reader_ops.cc:140] Starting epoch 1
2018-07-30 21:53:53.405823: I syntaxnet/reader_ops.cc:140] Starting epoch 2
INFO:tensorflow:Seconds elapsed in evaluation: 1.12, eval metric: 91.04%
INFO:tensorflow:Writing out trained parameters.
....
apurvnagvenkar commented 6 years ago

Hi, It doesn't work when I change my dataset. ` File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 303, in app.run(main) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/absl_py/absl/app.py", line 274, in run _run_main(main, argv) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/absl_py/absl/app.py", line 238, in _run_main sys.exit(main(argv)) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 299, in main Train(sess, num_actions, feature_sizes, domain_sizes, embedding_dims) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 239, in Train sess.run(targets, feed_dict=feed_dict) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1436,8] rhs shape= [1297,8] [[Node: save/Assign_9 = Assign[T=DT_FLOAT, _class=["loc:@embedding_matrix_2"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_matrix_2, save/RestoreV2_9)]]

Caused by op u'save/Assign_9', defined at: File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 303, in app.run(main) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/absl_py/absl/app.py", line 274, in run _run_main(main, argv) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/absl_py/absl/app.py", line 238, in _run_main sys.exit(main(argv)) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 299, in main Train(sess, num_actions, feature_sizes, domain_sizes, embedding_dims) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/parser_trainer.py", line 216, in Train parser.AddSaver(FLAGS.slim_model) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/main/syntaxnet/graph_builder.py", line 577, in AddSaver variables_to_save, builder=tf_saver.BaseSaverBuilder()) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 1338, in init self.build() File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 1347, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 1384, in _build build_save=build_save, build_restore=build_restore) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 835, in _build_internal restore_sequentially, reshape) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 494, in _AddRestoreOps assign_ops.append(saveable.restore(saveable_tensors, shapes)) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/training/saver.py", line 185, in restore self.op.get_shape().is_fully_defined()) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/ops/state_ops.py", line 283, in assign validate_shape=validate_shape) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/ops/gen_state_ops.py", line 60, in assign use_locking=use_locking, name=name) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/home/versionx/models/research/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/org_tensorflow/tensorflow/python/framework/ops.py", line 1718, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1436,8] rhs shape= [1297,8] [[Node: save/Assign_9 = Assign[T=DT_FLOAT, _class=["loc:@embedding_matrix_2"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_matrix_2, save/RestoreV2_9)]]

`

dsindex commented 6 years ago

i guess there is a dimension mismatch.

 lhs shape= [1436,8] rhs shape= [1297,8]

what is the hidden layer size of the model you have? in 'train.sh', '64' is used.

apurvnagvenkar commented 6 years ago

TAGGER_HIDDEN_LAYER_SIZES=64 TAGGER_HIDDEN_LAYER_PARAMS=64 Also i am just training the POS model remaining functionalities i have commented at the time of training and retraining. Can that be an issue?

convert_corpus ${CORPUS_DIR} train_tagger preprocess_with_tagger `

pretrain_parser

evaluate_pretrained_parser

train_parser

evaluate_parser

`

dsindex commented 6 years ago

i just have done testing and got the same error.

new
Building training network with parameters: feature_sizes: [8 2 3 3] domain_sizes: [5380    5 2087 2813]

original
Building training network with parameters: feature_sizes: [8 2 3 3] domain_sizes: [18755     5  4214  5365]

it seems that "embedding_matrix_0,embedding_matrix_1,embedding_matrix_2" model parameters refer to the original corpus(dimension?);;

so, i removed those parameters.

--pretrained_params=${TMP_DIR}/brain_tagger/greedy/${TAGGER_PARAMS}/model \
--pretrained_params_names=bias_0,weights_0,bias_1,weights_1 \

and then ran again

...
2018-08-02 22:49:23.828902: I syntaxnet/reader_ops.cc:140] Starting epoch 1
2018-08-02 22:49:24.811940: I syntaxnet/reader_ops.cc:140] Starting epoch 2
INFO:tensorflow:Seconds elapsed in evaluation: 1.11, eval metric: 86.80%
INFO:tensorflow:Writing out trained parameters.
INFO:tensorflow:Epochs: 2, num steps: 1100, seconds elapsed: 14.66, avg cost: 0.32,
INFO:tensorflow:Epochs: 2, num steps: 1200, seconds elapsed: 15.81, avg cost: 0.30,
INFO:tensorflow:Epochs: 2, num steps: 1300, seconds elapsed: 16.96, avg cost: 0.29,
INFO:tensorflow:Epochs: 2, num steps: 1400, seconds elapsed: 18.11, avg cost: 0.31,
INFO:tensorflow:Epochs: 2, num steps: 1500, seconds elapsed: 19.25, avg cost: 0.27,
INFO:tensorflow:Epochs: 2, num steps: 1600, seconds elapsed: 20.41, avg cost: 0.27,
INFO:tensorflow:Epochs: 2, num steps: 1700, seconds elapsed: 21.21, avg cost: 0.16,
2018-08-02 22:49:33.039774: I syntaxnet/reader_ops.cc:140] Starting epoch 3
INFO:tensorflow:Epochs: 3, num steps: 1800, seconds elapsed: 22.23, avg cost: 0.19,
INFO:tensorflow:Epochs: 3, num steps: 1900, seconds elapsed: 23.35, avg cost: 0.22,
INFO:tensorflow:Epochs: 3, num steps: 2000, seconds elapsed: 24.51, avg cost: 0.23,
INFO:tensorflow:Evaluating training network.
2018-08-02 22:49:37.094506: I syntaxnet/reader_ops.cc:140] Starting epoch 3
INFO:tensorflow:Seconds elapsed in evaluation: 1.00, eval metric: 89.33%

here is original one.

2018-08-02 22:51:34.841066: I syntaxnet/reader_ops.cc:140] Starting epoch 1
2018-08-02 22:51:35.818366: I syntaxnet/reader_ops.cc:140] Starting epoch 2
INFO:tensorflow:Seconds elapsed in evaluation: 1.10, eval metric: 81.04%
INFO:tensorflow:Writing out trained parameters.
INFO:tensorflow:Epochs: 2, num steps: 1100, seconds elapsed: 15.54, avg cost: 0.56,
INFO:tensorflow:Epochs: 2, num steps: 1200, seconds elapsed: 16.76, avg cost: 0.49,
INFO:tensorflow:Epochs: 2, num steps: 1300, seconds elapsed: 18.01, avg cost: 0.44,
INFO:tensorflow:Epochs: 2, num steps: 1400, seconds elapsed: 19.25, avg cost: 0.45,
INFO:tensorflow:Epochs: 2, num steps: 1500, seconds elapsed: 20.49, avg cost: 0.37,
INFO:tensorflow:Epochs: 2, num steps: 1600, seconds elapsed: 21.77, avg cost: 0.35,
INFO:tensorflow:Epochs: 2, num steps: 1700, seconds elapsed: 22.66, avg cost: 0.19,
2018-08-02 22:51:44.737869: I syntaxnet/reader_ops.cc:140] Starting epoch 3
INFO:tensorflow:Epochs: 3, num steps: 1800, seconds elapsed: 23.77, avg cost: 0.26,
INFO:tensorflow:Epochs: 3, num steps: 1900, seconds elapsed: 24.99, avg cost: 0.32,
INFO:tensorflow:Epochs: 3, num steps: 2000, seconds elapsed: 26.21, avg cost: 0.34,
INFO:tensorflow:Evaluating training network.
2018-08-02 22:51:49.019258: I syntaxnet/reader_ops.cc:140] Starting epoch 3
INFO:tensorflow:Seconds elapsed in evaluation: 1.00, eval metric: 87.70%

'86.80%' is a bit lower starting point but, it continues training after restoring 'bias_0,weights_0' parameters.