JerryJiaGit / facenet_trt

NVIDIA TensorRT implementation for facenet with pre-trained Inception-ResNet v1 SavedModel/Ckpt and MTCNN Networks
MIT License
102 stars 28 forks source link

Unresolved reference 'frozen_graph' #1

Closed FloatingLifeDream closed 5 years ago

FloatingLifeDream commented 5 years ago

Replacing your files directly with those of the facenet project can be using tensorrt to speed up facenet inference, right? I just started learning and I don't know if it's right. And there is an error in line 401 of the facenet.py. Unresolved reference 'frozen_graph' Another same error in line 410. How to solve this problem? Thanks in advance.

JerryJiaGit commented 5 years ago

Replacing your files directly with those of the facenet project can be using tensorrt to speed up facenet inference, right? I just started learning and I don't know if it's right. And there is an error in line 401 of the facenet.py. Unresolved reference 'frozen_graph' Another same error in line 410. How to solve this problem? Thanks in advance.

Yes, just replace those files and you can get ~30% speedup on facenet inference, but the improvement is from face embedding, so far, I can see no improvement on MTCNN due to network convert, but I am still working on it. Because MTCNN time cost is too high when the image resolution is high.

Let me check your findings, may I know your setup? TensorRT version? TensorFlow version? GPU?

JerryJiaGit commented 5 years ago

Replacing your files directly with those of the facenet project can be using tensorrt to speed up facenet inference, right? I just started learning and I don't know if it's right. And there is an error in line 401 of the facenet.py. Unresolved reference 'frozen_graph' Another same error in line 410. How to solve this problem? Thanks in advance.

You are right, I got issue fixed now. It was caused by code clean up issue in ckpt function, just need add below frozen_graph back: frozen_graph = tf.graph_util.convert_variables_to_constants( tf.get_default_session(), tf.get_default_graph().as_graph_def(), output_node_names=["embeddings"])

Please have a try. Thanks for your feedback.

FloatingLifeDream commented 5 years ago

Replacing your files directly with those of the facenet project can be using tensorrt to speed up facenet inference, right? I just started learning and I don't know if it's right. And there is an error in line 401 of the facenet.py. Unresolved reference 'frozen_graph' Another same error in line 410. How to solve this problem? Thanks in advance.

You are right, I got issue fixed now. It was caused by code clean up issue in ckpt function, just need add below frozen_graph back: frozen_graph = tf.graph_util.convert_variables_to_constants( tf.get_default_session(), tf.get_default_graph().as_graph_def(), output_node_names=["embeddings"])

Please have a try. Thanks for your feedback.

Thank you for your reply. I added these lines of code and solved the previous problem. But new problems have arisen. Here is my operating environment: OS:Ubuntu 16.04 GPU:Tesla P100-PCIE CUDA: 9.0.176 cuDNN:7.4.1.5 TensorRT:4.0.1.6 TensorFlow: tensorflow-gpu 1.10

I replace your files in facenet. I use the model 20180402-114759 of the facenet project and run predict.py. Here is the output: information.txt The output data is a bit more, I don't know which ones you need, so I saved the upload. I'm a beginner, so I don't know what's wrong. Thanks again.

JerryJiaGit commented 5 years ago

Glad to see original issue is fixed. So can we close this issue? You can open another one for your new findings. So after a quick check on your information.txt, the major problem is "WARNING:tensorflow:TensorRT mismatch. Compiled against version 3.0.4, but loaded 4.0.1. Things may not work" That means, you need update your tensorflow to support TensorRT 4.0.1. Your current Tensorflow is compiled to version TensorFlow 3.0.4.

Please try cuDNN 7.3 and official tensorflow-gpu 1.12 with supporting TensorRT 4 instead of TensorRT 3.

Another suggestion is that I can see "failed to allocate 15.89G (17066885120 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY". So suggest you could try below settings to just use 40% mem (your P100 has 16GB mem) for your network, and then you will have enough workspace mem for your TensorRT (assigned 2GB for inception-resnet and 1GB for pnet, 1GB for rnet and 1GB for onet). gpu_memory_fraction = 0.4 gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_memory_fraction) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False))

JerryJiaGit commented 5 years ago

Also uploaded one log for your reference: https://github.com/JerryJiaGit/facenet_trt/blob/master/log.txt

FloatingLifeDream commented 5 years ago

Thank you for your relpy and help. I update tensorflow-gpu to 1.12 and it works. Thank you again for your patience.

JerryJiaGit commented 5 years ago

That's good news. Could you share your time result with P100? For my understanding, on Pascal GPU, the TensorRT has less improvement than Volta GPU due to no Tensor Core. So let's see how much improvement we can get on Pascal GPU.

FloatingLifeDream commented 5 years ago

Of course. I'm glad to share my test results. I found that the performance improvement is not very obvious. I use the model 20180402-114759 of the facenet project and run predict.py.

I made the following changes to predict.py. for i in range(10):emb = sess.run(embeddings, feed_dict=feed_dict) start_time_total = time.process_time() for i in range(1000): start_time = time.process_time() emb = sess.run(embeddings, feed_dict=feed_dict) stop_time = time.process_time() print('result time:{:.2f} milliseconds'.format((stop_time - start_time) * 1000)) stop_time_total = time.process_time() print('total time:{:.2f} milliseconds'.format((stop_time_total - start_time_total) * 1000))

Before replacement: total time:61652.17 milliseconds After replacement: total time:61944.16 milliseconds More detailed output I put in these two files: before.txt after.txt Should I open a new issue? Thank you for your attention.

JerryJiaGit commented 5 years ago

Thanks for sharing.

I don't think it is an issue. Expected on your Pascal GPU due to no tensor cores there.

Try Volta or Turing GPU if you get a chance.

FloatingLifeDream commented 5 years ago

Okay, I see Thanks again.

JerryJiaGit commented 5 years ago

Okay, I see Thanks again.

Hi, I found something strange with checkpoints graph, there is no improvement on runtime with that graph. I am not sure if your performance reporting is based on checkpoints files.

Anyway, if you want, please have a try with .pb graph for your runtime comparing. You just need change face.py model file to: facenet_model_checkpoint = os.path.dirname(file) + "//..//model//20180402-114759-CASIA-WebFace//20180402-114759.pb"

I will investigate this issue and share with you later. (Already filed a new issue entry)

Thanks,. Jerry J