AaronJackson / vrn

:man: Code for "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression"
http://aaronsplace.co.uk/papers/jackson2017recon/
MIT License
4.52k stars 746 forks source link

performance #86

Closed bogdan112 closed 6 years ago

bogdan112 commented 6 years ago

I'm using centos 7, cuda 7.5 , cudnn 5.1. 32 cpu cores, 54gb ram and 3 GeForce 1080TI 11 GB each However it still takes minimum 1 hour to output. No errors reported, please help. Also, can I give it more than 1 picture by modifying this code?

AaronJackson commented 6 years ago

Something is VERY wrong if it is taking an hour to process. Should be about 200-300ms. What command exactly are you using to run the code? I assume ./run.sh

bogdan112 commented 6 years ago

Yes, I'm using ./run.sh, I also tried to run it without matlab, and as root. Always same result, at one point it ran in 10 minutes, but only once.

AaronJackson commented 6 years ago

Was this a fresh install of CentOS?

bogdan112 commented 6 years ago

Yup, wiped the Ubuntu just because I wanted to try your algorithm :)

bogdan112 commented 6 years ago

@AaronJackson , I used matlab to time the run.m, apparently the retval function is the one to blame for the long execution time. It's relative in run.sh being pushd, it makes a call to a dos function which takes 100% of the time

AaronJackson commented 6 years ago

Haha, well it should work fine on Ubuntu too. What dos function?

bogdan112 commented 6 years ago

@AaronJackson The problem seems to come from the CUDA overhead time, how did you solve for this to get the 300ms outputs?

AaronJackson commented 6 years ago

I have not done anything, it's worked fine on every CentOS machine I've used. This is also the first time I've heard about it running slowly for someone.

bogdan112 commented 6 years ago

I'm assuming it comes from the CUDA overtime because I downloaded the torch speed test, it takes 9-10minutes to go from the CPU testing to the GPU and then it's about 4 times slower on GPU than on CPU.. I'll keep trying and post it on here if I find a overtime solution.

AaronJackson commented 6 years ago

Is the speed consistent across all GPUs? You might have a crap motherboard.

bogdan112 commented 6 years ago

I'm using a asus z10pe-d8 ws ..it's nowhere crap, however one of the PCIe ports burnt while I worked on a photogrammetry project giving it a lot of power with 4G decoding. So I changed some of it's settings to get it to work without that PCIe and 3 instead of 4 GPUs. I thought it's because of the async torch runs cuda on so I just took out 2 of them, leaving just 1 GPU in, the Volatile GPU utilisation is 1% and it's still taking a lot of time (it didn't finish yet, it's safe to assume it will still take as long as before)

AaronJackson commented 6 years ago

Do you have another machine you can try? As I am pretty sure this is not an issue with vrn, I am going go close this issue and go to bed. Hope you manage to get things working properly.

bogdan112 commented 6 years ago

I'll try to get another machine. Thank you for your time and quick replies! Have a restful night!

bogdan112 commented 6 years ago

So, for anyone using 1080TI, TITAN or any of the higher performance nvidia GPUs, if you are using 7.5 CUDA every gpu processes will be incredibly slow. CUDA 8.0 had pretty good results for me.