Memory issue with dlib cnn extractor

Kirin-kun commented 6 years ago

Adapter: GTX 10606Gb Windows 7 dlib-19.13.1

I suspect there's a memory leak in the dlib-cnn extractor code. I attempted to extract about 100 "big" pictures and the process crashed with OOM whenever it came across a picture with no detectable face and rotation was on.

And once it stopped, it never recovers and the rest of the images aren't extracted.

I experimented and I narrowed it down to a single type of image: a 2000x3000 image with no detectable face.

I created a single blank image of 2000x3000 and I got this:

(faceenv) C:\Users\Kirin\face>python c:\users\kirin\faceswap\faceswap.py extract
 -i H:\Fakes\lili -o H:\Fakes\lili\aligned.dlib-cnn -D dlib-cnn -r on -ae -v
Using TensorFlow backend.
Output Directory: H:\Fakes\lili\aligned.dlib-cnn
Input Directory: H:\Fakes\lili
Loading Extract from Extract_Align plugin...
Using json serializer
Alignments filepath: H:\Fakes\lili\alignments.json
Starting, this may take a while...
  0%|                                                    | 0/1 [00:00<?, ?it/s]
----- Initial GPU Stats -----
GPU Driver:       398.36
GPU Device count: 1
GPU Devices:      ['GeForce GTX 1060 6GB']
GPU VRAM:         [6144.0]
GPU VRAM free:    6026.6875
-----------------------------

Initializing keras model...
2018-07-07 20:18:31.132680: I T:\src\github\tensorflow\tensorflow\core\common_ru
ntime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 5.78GiB
2018-07-07 20:18:31.141680: I T:\src\github\tensorflow\tensorflow\core\common_ru
ntime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-07 20:18:31.749715: I T:\src\github\tensorflow\tensorflow\core\common_ru
ntime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1
edge matrix:
2018-07-07 20:18:31.756716: I T:\src\github\tensorflow\tensorflow\core\common_ru
ntime\gpu\gpu_device.cc:929]      0
2018-07-07 20:18:31.759716: I T:\src\github\tensorflow\tensorflow\core\common_ru
ntime\gpu\gpu_device.cc:942] 0:   N
2018-07-07 20:18:31.763716: I T:\src\github\tensorflow\tensorflow\core\common_ru
ntime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:
0/task:0/device:GPU:0 with 2304 MB memory) -> physical GPU (device: 0, name: GeF
orce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
Adding DLib - CNN detector
GPU VRAM free:    3433.41796875
Warning: No faces were detected.
Failed to extract from image: H:\Fakes\lili\test.png. Reason: Error while callin
g cudaMalloc(&data, n) in file C:\Users\Kirin\AppData\Local\Temp\pip-install-08l
oid12\dlib\dlib\cuda\cuda_data_ptr.cpp:28. code: 2, reason: out of memory
100%|████████████████████████████████████████████| 1/1 [00:17<00:00, 17.99s/it]
Writing alignments to: H:\Fakes\lili\alignments.json
-------------------------
Images found:        1
Faces detected:      0
-------------------------
Done!

If the parameter "-r on" isn't present, there's no message about out of memory, but the python.exe crashes at the end (same type of issue than the previous issue I opened).

The issue isn't present with mtcnn extractor. In the verbose, I can clearly see it trying to extract 4 times by rotating the image, instead of crashing after the first.

I suspect the memory isn't freed when no face is found? So, when it reloads the rotated image, there's no more memory and so it gives the OOM message.

When rotation is off, python.exe crashes because, again, the memory wasn't freed.

I didn't read the code, so I just try to guess.

torzdf commented 6 years ago

Can you provide your test image please

Kirin-kun commented 6 years ago

No need to provide a test image. You can create easily a blank image (thus, no face to be detected) of dimensions 2000x3000. Or you can use any image of these dimensions with no detectable face by dlib. I 100% reproduced the crash with a single image of this type.

torzdf commented 6 years ago

This is stumping me at the moment. I'll continue to look into it, but it only seems to occur if the image starts as portrait and rotates to landscape. If the image goes in as 3000x2000 then it works fine.

Portrait images seem to take an additional 2-4MB of VRAM, but this wouldn't push it over the edge, and it certainly looks like it isn't over allocating VRAM.

This is putting in a 3500*2500 image. It allocates the VRAM on first pass, leaving 1943MB free. When it rotates it uses another 2MB but never uses more.

Adding DLib - CNN detector
GPU VRAM free:    5470.5625
[5470.5625]
Warning: No faces were detected.
[1942.5625]
Warning: No faces were detected.
[1940.5625]
Warning: No faces were detected.
[1940.5625]
Warning: No faces were detected.

Putting it in as 2500*3500 and I get:

GPU VRAM free:    5470.5625
[5470.5625]
Warning: No faces were detected.
[1944.5625]
Failed to extract from image: /home/matt/fake/test/dlib_test/Untitled.jpg. Reason: Error while calling cudaMalloc(&data, n) in file /tmp/pip-install-teoe75q2/dlib/dlib/cuda/cuda_data_ptr.cpp:28. code: 2, reason: out of memory

So for some reason it is using up the available 1945MB of VRAM rotating from portrait to landscape.

Like I say, I will continue to investigate.

Kirin-kun commented 6 years ago

Meaning you reproduced the issue?

Phew, I was afraid it would be something in my configuration.

I had 1067x1600 pictures in this set and they were rotated fine, so I guess there's a threshold at which the problem occurs.

Kirin-kun commented 6 years ago

I made some further tests and I'm a bit puzzled.

python.exe crashes at the end ("program ceased to function...") with a single 3000x2000 image and rotation on. It also crashes at the end with a second one. But if I add a third one (or more), it doesn't crashes and the process exits cleanly!

If I don't add "-r on", it becomes even stranger: if there's between 1 and 7 images, it crashes at the end. If I add a 8th one, it exits cleanly...

WTF?

If I use 2000x3000 images with rotation, it stops with an OOM immediately.

Also, for the hell of it, I resized the blank image to 3000x3000. I got this:

Adding DLib - CNN detector
GPU VRAM free:    3433.41796875
Resizing image from 3000x3000 to 2512x2512.
Warning: No faces were detected.
Resizing image from 3000x3000 to 2512x2512.
Warning: No faces were detected.
Resizing image from 3000x3000 to 2512x2512.
Warning: No faces were detected.
Resizing image from 3000x3000 to 2512x2512.
Warning: No faces were detected.
100%|████████████████████████████████████████████| 1/1 [00:20<00:00, 20.06s/it]
Writing alignments to: H:\Fakes\lili\alignments.json
-------------------------
Images found:        1
Faces detected:      0
-------------------------
Done!

So... no "out of memory", but it got resized on the fly (because it wouldn't fit in memory?).

torzdf commented 6 years ago

I'm going to look at extract again. I'm thinking of running 2 passes on the data (one for detection, one for landmarks) rather than the single pass we currently use, as I am having to tread a fine line on VRAM allocation. Hopefully that will fix this issue, but I need to look at timings.

So... no "out of memory", but it got resized on the fly (because it wouldn't fit in memory?).

Yes. DLib CNN is fairly linear in terms of VRAM required vs pixels in image, so when it hits a threshold, dictated by available vram, it will resize the image down.

What is strange in my example is that I still have 1.9GB free after it process the first pass of the image. Rotating the image from portrait to landscape immediately gobbles this up for no discernible reason, but landscape to portrait is fine.

If this persists after splitting out images it will need to be factored into the code.

torzdf commented 6 years ago

@Kirin-kun I have updated the way that DLib detects faces. It now scales all images to fit a square based on available VRAM. This should mitigate the rotating issues. This is currently in the staging branch pending testing.

It also means it should detect more faces too, as DLib does not have an option to set the threshold for a positive match, so enlarging the source image is the only way to increase the potential for positives.

My main concern is that enlarging all the images will slow down extraction, so I need to get some real world testing put through, to see whether I will need to add it as an option rather than as default.

If you get a chance, please could you checkout the staging branch and see if it works as expected/fixes your issue.

Kirin-kun commented 6 years ago

Fail... it extracts a single one at the start, then nothing. Something is not initialized correctly.

Adding DLib - CNN detector
Resizing image from 2000x3000 to 1802x2703.
  1%|?                                         | 1/170 [00:18<51:12, 18.18s/it]G
PU VRAM free:    388.5390625
Initializing DLib for frame size 207x207
Resizing image from 2000x3000 to 138x207.
Warning: No faces were detected.
GPU VRAM free:    388.5390625
Initializing DLib for frame size 207x207
Resizing image from 3000x2000 to 207x138.
Warning: No faces were detected.
GPU VRAM free:    388.5390625
Initializing DLib for frame size 207x207
Resizing image from 2000x3000 to 138x207.
Warning: No faces were detected.
GPU VRAM free:    388.5390625
Initializing DLib for frame size 207x207
Resizing image from 3000x2000 to 207x138.
Warning: No faces were detected.
  1%|?                                         | 2/170 [00:18<26:21,  9.41s/it]G
PU VRAM free:    388.5390625
Initializing DLib for frame size 207x207
Resizing image from 2000x3000 to 138x207.
Warning: No faces were detected.
GPU VRAM free:    388.5390625
Initializing DLib for frame size 207x207
Resizing image from 3000x2000 to 207x138.
Warning: No faces were detected.
GPU VRAM free:    388.5390625
Initializing DLib for frame size 207x207
Resizing image from 2000x3000 to 138x207.
Warning: No faces were detected.
GPU VRAM free:    388.5390625
Initializing DLib for frame size 207x207
Resizing image from 3000x2000 to 207x138.
Warning: No faces were detected.
  2%|?                                         | 3/170 [00:19<18:03,  6.49s/it]G

It resizes the image to a ridiculously small size then doesn't detect anything.

torzdf commented 6 years ago

Yay!

Will check,

torzdf commented 6 years ago

Just an update. I know the issue here. Unfortunately my server has just died, so I'm having to fix that before I can upload a fix.

torzdf commented 6 years ago

@Kirin-kun. Sorry for the delay. This should now be fixed in staging. Please could you test if you have a chance.

Kirin-kun commented 6 years ago

Will do when I'm home.

Kirin-kun commented 6 years ago

It works now, so I guess I'll close this one. It extracts faces normally.

Still, it doesn't explain how rotating a picture busts VRAM with dlib.

And that it doesn't happen with mtcnn.

torzdf commented 6 years ago

Without being able to dig into the dlib code, I guess we'll never know. Mtcnn and dlib work differently.

Thanks for the feedback, I'll push to master

deepfakes / faceswap

Memory issue with dlib cnn extractor #460