dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

中文全词覆盖BERT unsupported #786

Open rongruosong opened 5 years ago

rongruosong commented 5 years ago

Error Message

INFO:root:converting to Gluon checkpoint ... Traceback (most recent call last): File "convert_tf_model.py", line 159, in assert len(tf_config) == len(tf_config_names_to_gluon_config_names) AssertionError

eric-haibin-lin commented 5 years ago

@rongruosong the model should already be available http://gluon-nlp.mxnet.io/master/model_zoo/bert/index.html why are you converting it yourself?

@leezu could you help take a look for this assertion?

rongruosong commented 5 years ago

@rongruosong the model should already be available http://gluon-nlp.mxnet.io/master/model_zoo/bert/index.html why are you converting it yourself?

@leezu could you help take a look for this assertion?

I want to use convert_tf_model.py to convert Chinese-BERT-wwm(Joint Laboratory of HIT and iFLYTEK ) to gluon, so I want to know whether this script works.

fierceX commented 5 years ago

@rongruosong You can comment out the following:

with open(os.path.join(args.tf_checkpoint_dir, args.tf_config_name), 'r') as f:
    tf_config = json.load(f)
    assert len(tf_config) == len(tf_config_names_to_gluon_config_names)
    for tf_name, gluon_name in tf_config_names_to_gluon_config_names.items():
        if tf_name is None or gluon_name is None:
            continue
        assert tf_config[tf_name] == predefined_args[gluon_name]

I tried, without adding this content, you can still convert the tf model to gluon. And test the stdev = 7.2654996e-07 with compare_tf_gluon_model.py

leezu commented 5 years ago

Hi @rongruosong, the reason for the failure is that the bert_config.json of https://github.com/ymcui/Chinese-BERT-wwm configures some hyperparameters that are currently unsupported by the BERTEncoder API. You'd need to extend the API first.

In particular, the bert_config.json you want to use defins

{
  "attention_probs_dropout_prob": 0.1, 
  "directionality": "bidi", 
  "hidden_act": "gelu", 
  "hidden_dropout_prob": 0.1, 
  "hidden_size": 768, 
  "initializer_range": 0.02, 
  "intermediate_size": 3072, 
  "max_position_embeddings": 512, 
  "num_attention_heads": 12, 
  "num_hidden_layers": 12, 
  "pooler_fc_size": 768, 
  "pooler_num_attention_heads": 12, 
  "pooler_num_fc_layers": 3, 
  "pooler_size_per_head": 128, 
  "pooler_type": "first_token_transform", 
  "type_vocab_size": 2, 
  "vocab_size": 21128
}

out of which currently only the options appearing as keys in the following dict are support

tf_config_names_to_gluon_config_names = {
    'attention_probs_dropout_prob': 'embed_dropout',
    'hidden_act': None,
    'hidden_dropout_prob': 'dropout',
    'hidden_size': 'units',
    'initializer_range': None,
    'intermediate_size': 'hidden_size',
    'max_position_embeddings': 'max_length',
    'num_attention_heads': 'num_heads',
    'num_hidden_layers': 'num_layers',
    'type_vocab_size': 'token_type_vocab_size',
    'vocab_size': None
}