Closed Fan9 closed 4 years ago
刚修正
感谢苏神,目前测试data_utils.py已正常
但在执行pretraining.py报错:
Traceback (most recent call last):
File "pretraining.py", line 212, in
应该是tensorflow版本的问题将tensorflow 1.13.2升级到 1.14.0后不在报optimizer的错误。但又出现了如下错误:
mlm_loss (Lambda) () 0 token_ids[0][0] MLM-Proba[0][0] is_masked[0][0]
Total params: 325,545,608 Trainable params: 325,545,608 Non-trainable params: 0
2020-03-12 15:04:21.711097: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. Traceback (most recent call last): File "/root/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/root/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/root/anaconda3/envs/tf2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: buffer_size must be greater than zero. [[{{node ShuffleDataset_1}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pretraining.py", line 233, in
Errors may have originated from an input operation. Input Source operations connected to node ShuffleDataset_1: seed (defined at /data/sfang/Pretrain/pretraining/data_utils.py:142)
Original stack trace for 'ShuffleDataset_1':
File "pretraining.py", line 233, in
不知道是否是版本的问题。目前的版本:tensorflow-gpu 1.14.0 python 3.6 keras 2.3.1
不清楚,我也是用这个tf版本
苏神的keras版本呢? 我整个一样的 再来调试。避免版本问题
苏神的keras版本呢? 我整个一样的 再来调试。避免版本问题
预训练用不到keras,都是用tf.keras的
已经调通了,并成功运行了。还是要好好看看代码,要不浪费更多的时间。感谢
已经调通了,并成功运行了。还是要好好看看代码,要不浪费更多的时间。感谢
恭喜恭喜,赞一个。
再次预训练后保存的模型文件如何使用?
我目前是将训练好的模型文件替换官方的ckpt文件,保留json,和vocab文件。
但是在使用时报错。
是不是要修改保存模型参数的部分,
2020-03-19 13:29:24.795751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10470 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
==> searching: bert/embeddings/token_type_embeddings, found name: layer_with_weights-1/embeddings/.ATTRIBUTES/VARIABLE_VALUE
==> searching: bert/embeddings/position_embeddings, found name: layer_with_weights-2/embeddings/.ATTRIBUTES/VARIABLE_VALUE
==> searching: bert/embeddings/LayerNorm/gamma, found name: layer_with_weights-0/embeddings/.OPTIMIZER_SLOT/optimizer/m/.ATTRIBUTES/VARIABLE_VALUE
==> searching: bert/embeddings/LayerNorm/beta, found name: layer_with_weights-0/embeddings/.OPTIMIZER_SLOT/optimizer/v/.ATTRIBUTES/VARIABLE_VALUE
Traceback (most recent call last):
File "train_googloe_bert.py", line 75, in
是不是在读取时只能使用keras.load_models() 不能再用build_transformer_model的方式了呢
可以重新建立模型,然后加载模型权重,最后用save_weights_as_checkpoint
方法导出跟官方权重一致的ckpt格式。
嗯嗯。 已经成功了。 还有个疑问,在预训练时MLM的精度一路从46%提升至90%以上。后将进一步预训练好的模型用于下游任务时。效果很差。可能是什么原因导致的呢
嗯嗯。 已经成功了。 还有个疑问,在预训练时MLM的精度一路从46%提升至90%以上。后将进一步预训练好的模型用于下游任务时。效果很差。可能是什么原因导致的呢
正常来说MLM的准确率也就是五六十左右,你能跑到90有点不正常。。。
save_weights_as_checkpoint
class ModelCheckpoint(keras.callbacks.Callback): """自动保存最新模型 """ def on_epoch_end(self, epoch, logs=None):
self.model.save_weights_as_checkpoint(model_saved_path)
苏神,还想针对这个再次pretrain后的模型怎么加载的问题再请教下,我现在已经完成了再次pretrain,但无法加载,你说的save_weights_as_checkpoint,是否可以在pretraining.py中的这个方法里改,然后这样生成的ckpt就可以build_transformer_model加载了呢?如果不是,能否再稍微说的详细点,如何转换加载,非常感谢。
build_transformer_model
1、建立同样的模型;
2、bert.model.load_weights
加载你之前保存的权重;
3、bert.save_weights_as_checkpoint
保存为新的权重。
实在搞不明白,就多看看bert4keras/models.py
的源码~
build_transformer_model
1、建立同样的模型; 2、
bert.model.load_weights
加载你之前保存的权重; 3、bert.save_weights_as_checkpoint
保存为新的权重。实在搞不明白,就多看看
bert4keras/models.py
的源码~
感谢,已经可以加载了。
嗯嗯。 已经成功了。 还有个疑问,在预训练时MLM的精度一路从46%提升至90%以上。后将进一步预训练好的模型用于下游任务时。效果很差。可能是什么原因导致的呢
请问后续怎么解决的,我实在排查不出错误了
File "/root/anaconda3/envs/tf2/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap self.run() File "/root/anaconda3/envs/tf2/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs)
python 3.6 keras 2.3.1 tensorflow 1.13.2 运行data_utiils.py报错 File "/root/anaconda3/envs/tf2/lib/python3.6/multiprocessing/pool.py", line 103, in worker initializer(*initargs) File "/data/sfang/Pretrain/bert4keras/snippets.py", line 166, in worker_step r = func(d) File "/data/sfang/Pretrain/pretraining/data_utils.py", line 117, in paragraph_process instances = self.paragraph_process(texts) File "/data/sfang/Pretrain/pretraining/data_utils.py", line 209, in paragraph_process return super(TrainingDatasetRoBERTa, self).paragraph_process(texts, starts, ends, paddings) File "/data/sfang/Pretrain/pretraining/data_utils.py", line 53, in paragraph_process sub_instance = self.sentence_process(text) File "/data/sfang/Pretrain/pretraining/data_utils.py", line 188, in sentence_process add_sep=False) TypeError: tokenize() got an unexpected keyword argument 'add_cls'