brightmart / albert_zh

A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
https://arxiv.org/pdf/1909.11942.pdf
3.92k stars 755 forks source link

您好,想請教 freeze_graph 問題 #92

Open hamanfang opened 4 years ago

hamanfang commented 4 years ago

您好

小弟在colab上使用TPU fine-tune完 model 之後,想要部署到device上,使用了 README.md 中的方式先轉換成pb格式 !freeze_graph --input_checkpoint=./model.ckpt-39999 \ --output_graph=./albert_tiny_zh.pb \ --output_node_names=cls/predictions/truediv \ --checkpoint_version=1 --input_meta_graph=./model.ckpt-39999.meta --input_binary=true

但遇到了下列問題

Loaded meta graph file './model.ckpt-39999.meta W1223 10:15:35.218359 140525732943744 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow_core/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. 2019-12-23 10:15:35.380655: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2019-12-23 10:15:35.434037: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2019-12-23 10:15:35.434126: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (1e621329c4ed): /proc/driver/nvidia/version does not exist 2019-12-23 10:15:35.468834: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz 2019-12-23 10:15:35.470360: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5601f6c119c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2019-12-23 10:15:35.470414: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version I1223 10:15:36.139345 140525732943744 saver.py:1284] Restoring parameters from ./model.ckpt-39999 Traceback (most recent call last): File "/usr/local/bin/freeze_graph", line 8, in sys.exit(run_main()) File "/usr/local/lib/python2.7/dist-packages/tensorflow_core/python/tools/freeze_graph.py", line 487, in run_main app.run(main=my_main, argv=[sys.argv[0]] + unparsed) File "/usr/local/lib/python2.7/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run _run_main(main, args) File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/usr/local/lib/python2.7/dist-packages/tensorflow_core/python/tools/freeze_graph.py", line 486, in my_main = lambda unused_args: main(unused_args, flags) File "/usr/local/lib/python2.7/dist-packages/tensorflow_core/python/tools/freeze_graph.py", line 378, in main flags.saved_model_tags, checkpoint_version) File "/usr/local/lib/python2.7/dist-packages/tensorflow_core/python/tools/freeze_graph.py", line 361, in freeze_graph checkpoint_version=checkpoint_version) File "/usr/local/lib/python2.7/dist-packages/tensorflow_core/python/tools/freeze_graph.py", line 155, in freeze_graph_with_def_protos restorer.restore(sess, input_checkpoint) File "/usr/local/lib/python2.7/dist-packages/tensorflow_core/python/training/saver.py", line 1326, in restore err, "a mismatch between the current graph and the graph") tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

No OpKernel was registered to support Op 'InfeedEnqueueTuple' used by node input_pipeline_task0/while/InfeedQueue/enqueue/0 (defined at /lib/python2.7/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [_class=["loc:@input_pipeline_task0/while/IteratorGetNext"], shapes=[[8,128], [8,128], [8], [8], [8,128]], device_ordinal=0, layouts=[], dtypes=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32]] Registered devices: [CPU, XLA_CPU] Registered kernels:

[[input_pipeline_task0/while/InfeedQueue/enqueue/0]] 請問有什麼方式可以解決這個問題嗎 ? 謝謝
652994331 commented 4 years ago

@hamanfang 我想问问 我在使用这个脚本的时候 说没有node name 是 cls/prediction/... 看起来您没有这个问题,请问您检查过graph.pbtxt 了吗 里面有这个叫 cls/predictions/...的node 吗

hamanfang commented 4 years ago

@652994331 cls/predictions 這個node似乎在原本pre-trained model裡的ckpt才找的到,使用run_classification train完的ckpt是沒有的,而小弟後來是改用saved_model的方式輸出模型,使用上比較簡單