bojone / bert4keras

keras implement of transformers for humans
https://kexue.fm/archives/6915
Apache License 2.0
5.37k stars 927 forks source link

安装完环境和代码 运行异常 #149

Closed luoqishuai closed 4 years ago

luoqishuai commented 4 years ago

基本信息

操作系统: win10 Python版本: python3.6.5 Tensorflow版本: tensorflow-gpu 2.0.0 Keras版本: 2.3.1 bert4keras版本: 0.81.0 使用纯keras还是tf.keras: 什么导入库都没动,看样子是纯keras 加载的预训练模型: https://github.com/ymcui/Chinese-BERT-wwm BERT-wwm-ext, Chinese https://github.com/brightmart/albert_zh albert_tiny_zh cuda 10.0 2070super

核心代码

解压sentiment.zip
只修改了三个路径 config_path checkpoint_path dict_path
运行  task_conditional_language_model.py

输出信息

C:\public\study\python\python.exe C:/public/study/workplace/github/bert4keras/examples/task_conditional_language_model.py
2020-05-11 21:28:20.484372: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
Using TensorFlow backend.
2020-05-11 21:28:23.516770: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-05-11 21:28:23.580023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2070 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.815
pciBusID: 0000:01:00.0
2020-05-11 21:28:23.580140: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-05-11 21:28:23.580420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-05-11 21:28:23.580635: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-05-11 21:28:23.582299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2070 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.815
pciBusID: 0000:01:00.0
2020-05-11 21:28:23.582562: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-05-11 21:28:23.582894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-05-11 21:28:24.006381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-11 21:28:24.006464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-05-11 21:28:24.006510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-05-11 21:28:24.006935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6284 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)

Process finished with exit code -1073741819 (0xC0000005)

图片地址 (方便观看):https://sm.ms/image/kA6MY3TR1UxHvql 如果报错也就罢了....什么都没有,直接 exit,显示个code代码

自我尝试

没有抛出任何异常就退出了.一开始以为是内存原因,于是降低 maxlen 和batch_size ,不行. 一开始先试的哈工大的预训练模型,后来换成 chinese_L-12_H-768_A-12 和 albert_tiny ,不行. 再次翻看readme,其中说到模型以google版本为主,于是我又试了您的albert_tiny_google_zh_489k.zip,还是不行,感觉不是模型的问题. 我再次查看keras 和TensorFlow版本 确认是 2.3.1 2.0.0 ,重新安装keras TensorFlow,不行 后来我怀疑可能是 TensorFlow版本的问题,我安装虚拟环境 装了TensorFlow1.14 keras 2.3.1 还是不行 图片地址:https://sm.ms/image/3gR2ncypGEqNu8L

大佬,求教.

bojone commented 4 years ago

你这个输出显示根本还没完成初始化,还没到跑模型这一步。

你试试下述代码能正常跑出结果吗?

import keras.backend as K

print(K.eval(K.zeros(1)))
luoqishuai commented 4 years ago

https://sm.ms/image/qWxisH1e2ZIf37k 在TensorFlow-gpu2.0.0 和keras 2.3.1 上能运行

luoqishuai commented 4 years ago

我debug了一下 在build_transformer_model 运行时 exit了.

bojone commented 4 years ago

这种错误真没遇见过了,真的是爱莫能助了,总感觉是环境本身的问题,会不会显卡驱动有问题?

luoqishuai commented 4 years ago

build_transformer_model transformer.load_weights_from_checkpoint values = [self.load_variable(checkpoint, v) for v in variables] tf.train.load_variable(checkpoint, name) tf.train.load_variable 我pycharm上显示我TensorFlow 相关使用不对 Cannot find reference 'load_variable' in 'init.py'

是TensorFlow gpu-2.0.0不对吗?

bojone commented 4 years ago

Cannot find reference 'load_variable' in 'init.py'

bert4keras可以配套tf2使用,而且你又说试过换tf1.14了还是不行,所以感觉是gpu环境没搭建好,跟具体哪个版本的tf没关系。

luoqishuai commented 4 years ago

嗯嗯,我再重新搭一下环境.