Try running a sample multiple times on CoLab to compare CPU and GPU processing speed

alisonrclarke commented 4 years ago

MC: Could include exploring OpenPose's 'maximum accuracy' setting, i.e. how much performance improves with processing power.

MarionBWeinzierl commented 4 years ago

I tried adding -DGPU_MODE=CPU_ONLY and running without GPUs, but the build failed in a first attempt.

MarionBWeinzierl commented 4 years ago

Building it with -DUSE_MKL=OFF seems to fix it, although the installation guide tells you to then specify the path to your own Caffe version (and I don't).

Have to test it properly, but first tests look good.

MarionBWeinzierl commented 4 years ago

I run the example video from the Colab script with CUDA, which (for the processing code section) took 40.542s. Running it with the CPU-only configuration took 5842.492s (I already thought something had gone wrong when it hadn't finished after an hour). That is about a factor 146 difference.

In the OpenPose installation guide it says (https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/installation.md#cpu-version): "The default CPU version takes ~0.2 images per second on Ubuntu (~50x slower than GPU) while the MKL version provides a roughly 2x speedup at ~0.4 images per second." and "Accuracy of the CPU version is ~1% higher than CUDA version".

MarionBWeinzierl commented 4 years ago

The previous example run on only 5 seconds of the example video. Running on 60 seconds, the CPU version ran several hours, and then ended unsuccessfully (time-out?).

MarionBWeinzierl commented 4 years ago

Given the "50x slower" statement from the OpenPose documentation and these tests I suggest closing this issue and concentration on the GPU versions. @alisonrclarke , are you OK with this?

alisonrclarke commented 4 years ago

Yes, I guess this issue should be closed, though it might be worth checking before we go much further with CoLab that we can process the whole video using the GPU version without it taking hours - if CoLab is significantly slower than running the Windows binaries we might have to have a rethink.

MarionBWeinzierl commented 4 years ago

I ran the GPU version on 1 minute of the video and it took 422.74s for the detection section. Running it on the whole video (which is 57 minutes 25 seconds long) took 3042.195s (which is still shorter than running the CPU version on 5 seconds).

MarionBWeinzierl commented 4 years ago

Did you try and run the whole video on your computer? How long did that take?

alisonrclarke commented 4 years ago

Martin said:

for the body model, it runs at about 1/3 speed (8 frames per second) on my laptop; I just tried and I don’t have the memory to cope with the face and hand models – now I know why one might need 64GB – but on the desktop they make the process much slower, e.g. 1/10 of real time.

I'm assuming when he says 1/3 speed he means that 1 minute of video takes 3 minutes to process, assuming the videos are 24 fps, so an hour's video would take 3 hours to process. So CoLab taking 50 minutes to process a 57 minute video is a win :)

MarionBWeinzierl commented 4 years ago

Hm, thinking about it I would have said it took longer than an hour. Maybe I should rerun the test to make sure that I didn't make a mistake when checking the time.

MarionBWeinzierl commented 4 years ago

24299.941s it says now, which is ~6.75h. Don't know, I might have missed a digit when I wrote it down last time.

alisonrclarke commented 4 years ago

I'm trying to run the same command locally now. Looking at the commands doc Martin sent over, it's not clear whether the 1/3 speed he mentioned was with or without writing the output video, as he did sat that that adds to the time.

MarionBWeinzierl commented 4 years ago

I suspect running it as Colab script does add significant overhead.

alisonrclarke commented 4 years ago

I tried running the long video on my local machine and gave up after 24h!

Martin said:

I just checked with the same clip (Chiranjib_raga_1a.mp4) and it reduced the time from 717 to 666 secs on my laptop (7% improvement). How does CoLab compare to that?

Video Chiranjib_raga_1a.mp4 is 182s long. On CoLab, OpenPose took 336s when creating the output video (not including converting from mp4 back to avi), and 258s without the output video.

That seems worthwhile. The previous speeds he mentioned may have been on the lab desktop (which may have a GPU, I don't know).

MarionBWeinzierl commented 4 years ago

Looking at the files in the project folder most videos seem to be just a few (~3) minutes, so the runtime might not be too unfeasible. From the 1h test video I had started to be a bit concerned...

alisonrclarke commented 4 years ago

Sorry, not sure why I picked that one at random from Dropbox in the first place! I think this is probably OK to close now we've confirmed with Martin our plan of action?

MarionBWeinzierl commented 4 years ago

I will have one last look at running it with different accuracies, as you had mentioned that in your first comment here.

MarionBWeinzierl commented 4 years ago

Hm, was it this that he was referring to? https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/quick_start.md#maximum-accuracy-configuration

MarionBWeinzierl commented 4 years ago

I tried that one (setting --net_resolution "1312x736" --scale_number 4 --scale_gap 0.25) on 5 seconds and a 3.04min video. for the 5 seconds, it took 105s to do the detection (while, without the additional parameters, it took 53s). For the whole video, the non-parametrised version took 1778s (just under 30min). Running the parametrised version takes 3541s. So we seem to have a time factor 2 here. However, I cannot really say whether the result improves... This is probably for Martin to say. I suspect there might be no numeric metrics for this?

MarionBWeinzierl commented 4 years ago

Anyways, as you said, we know what we are doing for now, so I'll close this.

DurhamARC / raga-pose-estimation

Try running a sample multiple times on CoLab to compare CPU and GPU processing speed #3