Closed Kirin-kun closed 6 years ago
Can you provide your test image please
No need to provide a test image. You can create easily a blank image (thus, no face to be detected) of dimensions 2000x3000. Or you can use any image of these dimensions with no detectable face by dlib. I 100% reproduced the crash with a single image of this type.
This is stumping me at the moment. I'll continue to look into it, but it only seems to occur if the image starts as portrait and rotates to landscape. If the image goes in as 3000x2000 then it works fine.
Portrait images seem to take an additional 2-4MB of VRAM, but this wouldn't push it over the edge, and it certainly looks like it isn't over allocating VRAM.
This is putting in a 3500*2500 image. It allocates the VRAM on first pass, leaving 1943MB free. When it rotates it uses another 2MB but never uses more.
Adding DLib - CNN detector
GPU VRAM free: 5470.5625
[5470.5625]
Warning: No faces were detected.
[1942.5625]
Warning: No faces were detected.
[1940.5625]
Warning: No faces were detected.
[1940.5625]
Warning: No faces were detected.
Putting it in as 2500*3500 and I get:
GPU VRAM free: 5470.5625
[5470.5625]
Warning: No faces were detected.
[1944.5625]
Failed to extract from image: /home/matt/fake/test/dlib_test/Untitled.jpg. Reason: Error while calling cudaMalloc(&data, n) in file /tmp/pip-install-teoe75q2/dlib/dlib/cuda/cuda_data_ptr.cpp:28. code: 2, reason: out of memory
So for some reason it is using up the available 1945MB of VRAM rotating from portrait to landscape.
Like I say, I will continue to investigate.
Meaning you reproduced the issue?
Phew, I was afraid it would be something in my configuration.
I had 1067x1600 pictures in this set and they were rotated fine, so I guess there's a threshold at which the problem occurs.
I made some further tests and I'm a bit puzzled.
python.exe crashes at the end ("program ceased to function...") with a single 3000x2000 image and rotation on. It also crashes at the end with a second one. But if I add a third one (or more), it doesn't crashes and the process exits cleanly!
If I don't add "-r on", it becomes even stranger: if there's between 1 and 7 images, it crashes at the end. If I add a 8th one, it exits cleanly...
WTF?
If I use 2000x3000 images with rotation, it stops with an OOM immediately.
Also, for the hell of it, I resized the blank image to 3000x3000. I got this:
Adding DLib - CNN detector
GPU VRAM free: 3433.41796875
Resizing image from 3000x3000 to 2512x2512.
Warning: No faces were detected.
Resizing image from 3000x3000 to 2512x2512.
Warning: No faces were detected.
Resizing image from 3000x3000 to 2512x2512.
Warning: No faces were detected.
Resizing image from 3000x3000 to 2512x2512.
Warning: No faces were detected.
100%|████████████████████████████████████████████| 1/1 [00:20<00:00, 20.06s/it]
Writing alignments to: H:\Fakes\lili\alignments.json
-------------------------
Images found: 1
Faces detected: 0
-------------------------
Done!
So... no "out of memory", but it got resized on the fly (because it wouldn't fit in memory?).
I'm going to look at extract again. I'm thinking of running 2 passes on the data (one for detection, one for landmarks) rather than the single pass we currently use, as I am having to tread a fine line on VRAM allocation. Hopefully that will fix this issue, but I need to look at timings.
So... no "out of memory", but it got resized on the fly (because it wouldn't fit in memory?).
Yes. DLib CNN is fairly linear in terms of VRAM required vs pixels in image, so when it hits a threshold, dictated by available vram, it will resize the image down.
What is strange in my example is that I still have 1.9GB free after it process the first pass of the image. Rotating the image from portrait to landscape immediately gobbles this up for no discernible reason, but landscape to portrait is fine.
If this persists after splitting out images it will need to be factored into the code.
@Kirin-kun I have updated the way that DLib detects faces. It now scales all images to fit a square based on available VRAM. This should mitigate the rotating issues. This is currently in the staging branch pending testing.
It also means it should detect more faces too, as DLib does not have an option to set the threshold for a positive match, so enlarging the source image is the only way to increase the potential for positives.
My main concern is that enlarging all the images will slow down extraction, so I need to get some real world testing put through, to see whether I will need to add it as an option rather than as default.
If you get a chance, please could you checkout the staging branch and see if it works as expected/fixes your issue.
Fail... it extracts a single one at the start, then nothing. Something is not initialized correctly.
Adding DLib - CNN detector
Resizing image from 2000x3000 to 1802x2703.
1%|? | 1/170 [00:18<51:12, 18.18s/it]G
PU VRAM free: 388.5390625
Initializing DLib for frame size 207x207
Resizing image from 2000x3000 to 138x207.
Warning: No faces were detected.
GPU VRAM free: 388.5390625
Initializing DLib for frame size 207x207
Resizing image from 3000x2000 to 207x138.
Warning: No faces were detected.
GPU VRAM free: 388.5390625
Initializing DLib for frame size 207x207
Resizing image from 2000x3000 to 138x207.
Warning: No faces were detected.
GPU VRAM free: 388.5390625
Initializing DLib for frame size 207x207
Resizing image from 3000x2000 to 207x138.
Warning: No faces were detected.
1%|? | 2/170 [00:18<26:21, 9.41s/it]G
PU VRAM free: 388.5390625
Initializing DLib for frame size 207x207
Resizing image from 2000x3000 to 138x207.
Warning: No faces were detected.
GPU VRAM free: 388.5390625
Initializing DLib for frame size 207x207
Resizing image from 3000x2000 to 207x138.
Warning: No faces were detected.
GPU VRAM free: 388.5390625
Initializing DLib for frame size 207x207
Resizing image from 2000x3000 to 138x207.
Warning: No faces were detected.
GPU VRAM free: 388.5390625
Initializing DLib for frame size 207x207
Resizing image from 3000x2000 to 207x138.
Warning: No faces were detected.
2%|? | 3/170 [00:19<18:03, 6.49s/it]G
It resizes the image to a ridiculously small size then doesn't detect anything.
Yay!
Will check,
Just an update. I know the issue here. Unfortunately my server has just died, so I'm having to fix that before I can upload a fix.
@Kirin-kun. Sorry for the delay. This should now be fixed in staging. Please could you test if you have a chance.
Will do when I'm home.
It works now, so I guess I'll close this one. It extracts faces normally.
Still, it doesn't explain how rotating a picture busts VRAM with dlib.
And that it doesn't happen with mtcnn.
Without being able to dig into the dlib code, I guess we'll never know. Mtcnn and dlib work differently.
Thanks for the feedback, I'll push to master
Adapter: GTX 10606Gb Windows 7 dlib-19.13.1
I suspect there's a memory leak in the dlib-cnn extractor code. I attempted to extract about 100 "big" pictures and the process crashed with OOM whenever it came across a picture with no detectable face and rotation was on.
And once it stopped, it never recovers and the rest of the images aren't extracted.
I experimented and I narrowed it down to a single type of image: a 2000x3000 image with no detectable face.
I created a single blank image of 2000x3000 and I got this:
If the parameter "-r on" isn't present, there's no message about out of memory, but the python.exe crashes at the end (same type of issue than the previous issue I opened).
The issue isn't present with mtcnn extractor. In the verbose, I can clearly see it trying to extract 4 times by rotating the image, instead of crashing after the first.
I suspect the memory isn't freed when no face is found? So, when it reloads the rotated image, there's no more memory and so it gives the OOM message.
When rotation is off, python.exe crashes because, again, the memory wasn't freed.
I didn't read the code, so I just try to guess.