First call to sess.run() at inference time is slow

NVlabs / contact_graspnet

Efficient 6-DoF Grasp Generation in Cluttered Scenes

Other

306 stars 107 forks source link

First call to sess.run() at inference time is slow #9

Closed thomasweng15 closed 2 years ago

thomasweng15 commented 2 years ago

Hi, have you encountered an issue where the first call to sess.run() in contact_grasp_estimator.py is slow? I am running the inference example in the readme, and when I time sess.run() the first call takes much longer than subsequent calls:

Run inference 1162.3998165130615
Preprocess pc for inference 0.0007269382476806641
Run inference 0.2754530906677246
Preprocess pc for inference 0.0006759166717529297

I found this thread on what seems to be a similar issue but the simple resolutions have not worked, and I have not tried compiling tensorflow from source yet. I am running on a GTX 3090 with CUDA 11.1, tensorflow-gpu==2.2. Have you encountered this issue before? Thanks for your help.

thomasweng15 commented 2 years ago

The quality for the grasps is also much worse than expected:

I have tried recompiling the pointnet tf ops using this script https://github.com/NVlabs/contact_graspnet/blob/main/compile_pointnet_tfops.sh but the problem persists. I did the same setup on another, brand new machine, also with a GTX 3090 and with CUDA 11.2, and encountered the same problem and performance.

arsalan-mousavian commented 2 years ago

Regarding inference: on the desktops I have tried it, it may take 2-3 seconds on the first inference but not 1162 seconds... not sure why it takes longer on your machine.

Regarding problem in inference: Some thing is terribly wrong in here. I assume you already checked git status and there is nothing changed in the repo. I have tested this code with cuda 11.1 on multiple machines with no problem. Can you try cuda 11.1 and tensorflow-gpu 2.2.0? In other projects with custom cuda ops (in pytorch), I have seen discrepancy between cuda versions (I know it's surprising, but I have seen it).

arsalan-mousavian commented 2 years ago

@thomasweng15 let me know if setting up cuda 11.1 fixes the issue for you.

thomasweng15 commented 2 years ago

I switched to cuda 11.1 and ran it with tensorflow-gpu 2.2, but had the same issue. I then upgraded to tensorflow-gpu 2.5, reasoning that the 3080 and 3090 GPUs were too new for previous tensorflow-gpu versions, and also knowing that my labmate also used 2.5. I had to recompile the pointnet tf_ops, and install cudnn 8.1 and cudatoolkit 11.0 from conda-forge. The problem is now fixed: the first inference runs in 2 seconds, and the predictions look much better:

So the takeaway is that newer 30xx GPUs should upgrade to to tensorflow-gpu=2.5.

xlim1996 commented 2 years ago

@thomasweng15 hi, Do you have a yml file about the new version environment tf2.5 , cuda 11.0 and cudnn8.1 ?