简单调用BiLSTM_CRF模型，使用最基本bert-chinese作为embedding，运行报错layer_crf does not support masking

You must follow the issue template and provide as much information as possible. otherwise, this issue will be closed. 请按照 issue 模板要求填写信息。如果没有按照 issue 模板填写，将会忽略并关闭这个 issue

Check List

Thanks for considering to open an issue. Before you submit your issue, please confirm these boxes are checked.

You can post pictures, but if specific text or code is required to reproduce the issue, please provide the text in a plain text format for easy copy/paste.

[√] I have searched in existing issues but did not find the same one.
[√] I have read the documents

Environment

OS [e.g. Mac OS, Linux]: Linux
Python Version: 3.6.12
Kashgari: 1.1.4
tf 1.14.0
keras 2.2.5

各种包的版本应该都没问题的。

Issue Description

只是简单测试一个中文序列标注任务，调用BiLSTM_CRF模型，使用BertEmbedding来加载bert_chinese_base。

模型fit的时候，报错：layer_crf does not support masking, but was passed an input_mask: Tensor("non_masking_layer/Identity_1:0" 很奇怪，kashgari里面的代码都没改动。我看，就BertEmbedding里面有个自定义的NonMaskingLayer。我把它注释了，直接bert的embed_model.output，模型fit也报错：layer_crf does not support masking, but was passed an input_mask: Tensor("Encoder-Output/All:0"

kashgari.embedding.bert_embedding.py: `def _build_model(self, **kwargs): if self.embed_model is None: seq_len = self.sequence_length if isinstance(seq_len, tuple): seq_len = seq_len[0] if isinstance(seq_len, str): logging.warning(f"Model will be built until sequence length is determined") return config_path = os.path.join(self.model_folder, 'bert_config.json') check_point_path = os.path.join(self.model_folder, 'bert_model.ckpt') bert_model = keras_bert.load_trained_model_from_checkpoint(config_path, check_point_path, seq_len=seq_len, output_layer_num=self.layer_nums, training=self.training, trainable=self.trainable)

        self._model = tf.keras.Model(bert_model.inputs, bert_model.output)
        bert_seq_len = int(bert_model.output.shape[1])
        if bert_seq_len < seq_len:
            logging.warning(f"Sequence length limit set to {bert_seq_len} by pre-trained model")
            self.sequence_length = bert_seq_len
        self.embedding_size = int(bert_model.output.shape[-1])
        output_features = NonMaskingLayer()(bert_model.output)   ## 这里有个NonMaskingLayer()
        self.embed_model = tf.keras.Model(bert_model.inputs, output_features)    ## 如果把这两行注释了，直接bert_model.output也报错
        # self.embed_model = tf.keras.Model(bert_model.inputs, bert_model.output)
        logging.warning(f'seq_len: {self.sequence_length}')`

不知道哪里有问题么？直接简单调用这个模型，也运行不起来T_T。

BrikerMan / Kashgari