MHDBST commented 5 years ago

Hi, I am trying to fine tune BERT using TPU on my own dataset. To fine tune BERT I wrote the following code:

`def create_model(is_training, input_ids, input_mask, segment_ids, labels, num_labels, use_one_hot_embeddings): """Creates a classification model."""

tags = set() if is_training: tags.add("train") bert_module = hub.Module(BERT_MODEL_HUB, tags=tags, trainable=True) bert_inputs = dict( input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids) bert_outputs = bert_module( inputs=bert_inputs, signature="tokens", as_dict=True)

output_layer = bert_outputs["pooled_output"]

hidden_size = output_layer.shape[-1].value

output_weights = tf.get_variable( "output_weights", [num_labels, hidden_size], initializer=tf.truncated_normal_initializer(stddev=0.02))

output_bias = tf.get_variable( "output_bias", [num_labels], initializer=tf.zeros_initializer())

with tf.variable_scope("loss"): if is_training:

I.e., 0.1 dropout

  output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)

one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
loss = tf.reduce_mean(per_example_loss)

return (loss, per_example_loss, logits, probabilities)`

This model simply load BERT parameters using hub and try to fine tune it by setting trainable parameter as TRUE. When I use TPU to run my code ( particularly after adding my create_model and not using bert.run_classifier.create_model) I get the following ERROR for millions of parameters and I don't know whether the model is fine-tuned or not:

ERROR:tensorflow:Operation of type Placeholder (module_apply_tokens/bert/encoder/layer_5/intermediate/dense/kernel) is not supported on the TPU. Execution will fail if this op is used in the graph.

How can I solve this issue? How can I use TPU and fine tune bert?

Thanks

saberkun commented 5 years ago

Hi, Do you successfully train? My experience is that the error comes out but the training still works. It is not a fatal error. The placeholders are inside the graph but do not run. Thus, TPU just complains but should still work?

MHDBST commented 5 years ago

@saberkun Yes I do. But I'm not sure whether the training is done without any issue. I mean it doesn't stop from training but I'm not sure whether all parameters are trained after getting this error.

saberkun commented 5 years ago

Should be fine. If the placeholders ran, TPU will crash the training program. If you get desired accuracy through the TF hub (colab), it should be good.

google-research / bert

Trainable BERT Using TPU #613

I.e., 0.1 dropout