Open MHDBST opened 5 years ago
Hi, Do you successfully train? My experience is that the error comes out but the training still works. It is not a fatal error. The placeholders are inside the graph but do not run. Thus, TPU just complains but should still work?
@saberkun Yes I do. But I'm not sure whether the training is done without any issue. I mean it doesn't stop from training but I'm not sure whether all parameters are trained after getting this error.
Should be fine. If the placeholders ran, TPU will crash the training program. If you get desired accuracy through the TF hub (colab), it should be good.
Hi, I am trying to fine tune BERT using TPU on my own dataset. To fine tune BERT I wrote the following code:
`def create_model(is_training, input_ids, input_mask, segment_ids, labels, num_labels, use_one_hot_embeddings): """Creates a classification model."""
tags = set() if is_training: tags.add("train") bert_module = hub.Module(BERT_MODEL_HUB, tags=tags, trainable=True) bert_inputs = dict( input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids) bert_outputs = bert_module( inputs=bert_inputs, signature="tokens", as_dict=True)
output_layer = bert_outputs["pooled_output"]
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable( "output_weights", [num_labels, hidden_size], initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable( "output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"): if is_training:
I.e., 0.1 dropout
This model simply load BERT parameters using hub and try to fine tune it by setting
trainable
parameter asTRUE
. When I use TPU to run my code ( particularly after adding mycreate_model
and not usingbert.run_classifier.create_model
) I get the following ERROR for millions of parameters and I don't know whether the model is fine-tuned or not:ERROR:tensorflow:Operation of type Placeholder (module_apply_tokens/bert/encoder/layer_5/intermediate/dense/kernel) is not supported on the TPU. Execution will fail if this op is used in the graph.
How can I solve this issue? How can I use TPU and fine tune bert?
Thanks