Open nlp4869 opened 4 years ago
Maybe you didn't specify the correct parameters, you should add --layers_num 24
to the command. Because the default of layers_num is 12. If you have any question, please tell me.
Hi @ZhaoxinRuc , Here is the command that I used to convert model:
python convert_bert_from_uer_to_google.py --layers_num=24 --input_model_path=./mixed_large_24_model.bin --output_model_path=./output/model.ckpt
Also, as I am using a CPU-only machine, I added map_location='cpu'
in torch.load function: https://github.com/dbiir/UER-py/blob/master/scripts/convert_bert_from_uer_to_google.py#L21
I am not sure if this will result in a bad conversion.
I am using tensorflow==1.14.0
and python 3.7.3
Thank you for your help.
Update:
Using GPU with original script (without map_location='cpu'
) under tf=1.13.1 and python3 still results in bad random results (tried with several classification tasks).
We have tested BERT-large model on Google BERT-tf and it performs normally. Could you give us more details, such as vocabulary (do you use the Chinese version?) and config file. Here are our commands:
python .\scripts\convert_bert_from_uer_to_google.py --layers_num 24 --input_model_path .\models\mixed_large_24_model.bin --output_model_path .\models\google_model_24.ckpt
Tensorflow: python run_classifier.py --task_name=BKRV --do_train=true --do_eval=true --data_dir=./datasets/book_review --vocab_file=./model/google_zh_vocab.txt --bert_config_file=.model/bert_config.json --init_checkpoint=./output_new/googel_model_24.ckpt --max_seq_length=128 --train_batch_size=4 --learning_rate=2e-5 --num_train_epochs=3.0 --output_dir=/run/user/1010/tmp
Config file is as follows: { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "max_position_embeddings": 512, "num_attention_heads": 16, "num_hidden_layers": 24, "type_vocab_size": 2, "vocab_size": 21128 }
BKRV is a Chinese dataset and its label and text are divided by \t . Perhaps we can communicate in private. QQ number:1152543959
I used original Chinese vocabulary (21,128 entries) and config file provided by Google official.
I have been using original roberta-wwm-ext-large
and could get an accuracy of ~95.7 on the ChnSentiCorp test set.
But when I ONLY change the weight file to your model, the performance gets bad (almost random output). I checked the logs and found the weights are correctly initialized, so I think it must be the problem of the conversion process.
I noticed that you've updated conversion script, so I regenerated your model into TensorFlow version, but it didn't help.
Here is the generated files:
do you change the _runclassifier.py file in google-research/bert as follows:
class BkrvProcessor(DataProcessor):
"""Processor for the MRPC data set (GLUE version)."""
def get_train_examples(self, data_dir):
"""See base class."""
return self._create_examples(
self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")
def get_dev_examples(self, data_dir):
"""See base class."""
return self._create_examples(
self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
def get_test_examples(self, data_dir):
"""See base class."""
return self._create_examples(
self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
def get_labels(self):
"""See base class."""
return ["0", "1"]
def _create_examples(self, lines, set_type):
"""Creates examples for the training and dev sets."""
examples = []
for (i, line) in enumerate(lines):
if i == 0:
continue
guid = "%s-%s" % (set_type, i)
text_a = tokenization.convert_to_unicode(line[1])
if set_type == "test":
label = "0"
else:
label = tokenization.convert_to_unicode(line[0])
examples.append(
InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
return examples
def main(_):
tf.logging.set_verbosity(tf.logging.INFO)
processors = {
"cola": ColaProcessor,
"mnli": MnliProcessor,
"mrpc": MrpcProcessor,
"xnli": XnliProcessor,
"bkrv": BkrvProcessor,
}
And this is my result in book_review dataset:
INFO:tensorflow: Eval results INFO:tensorflow: eval_accuracy = 0.8954 INFO:tensorflow: eval_loss = 0.5452643 INFO:tensorflow: global_step = 15000 INFO:tensorflow: loss = 0.5452643
can you give me the config file you used in the experiment? this is my result in chnsenticorp dataset by using google_model_24.ckpt converted from mixed_large_24_model.bin.
INFO:tensorflow: Eval results INFO:tensorflow: eval_accuracy = 0.95 INFO:tensorflow: eval_loss = 0.29745975 INFO:tensorflow: global_step = 7200 INFO:tensorflow: loss = 0.29745975
Config file as @zhezhaoa provided. { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "max_position_embeddings": 512, "num_attention_heads": 16, "num_hidden_layers": 24, "type_vocab_size": 2, "vocab_size": 21128 }
I assumed your model is identical to the roberta-wwm-ext-large
, so I just replaced your model (converted by your script) and DID NOT change other configurations (such as config, vocabulary, hyper-params, etc.).
It seems that you use the correct vocabulary and config file. We download the lastest Google-BERT-tf and achieve normal results by loading our model. Perhaps we can communicate via mail since we don't think it is a general problem. 1152543959@qq.com
Hi, Thank you for providing
mixed_large_24_model.bin
. I was trying to convert this model usingscripts/convert_bert_from_uer_to_google.py
. Everything goes fine without any alerts. However, when I tried to fine-tune on a simple binary text classification task, it showed random results (i.e. acc=50%). I used Google's vocabulary and Bert-large config file and did not change other settings. I think it might be the conversion was not successful, and thus results in a bad performance. Any ideas?