deepfakes / faceswap

Deepfakes Software For All
https://www.faceswap.dev
GNU General Public License v3.0
52.54k stars 13.23k forks source link

out of memory when extract after today update #234

Closed 3xtr3m3d closed 6 years ago

3xtr3m3d commented 6 years ago

I have extracted faces from 2000+ images after today update and it went great. then i tried to extract another set of 100 images and program failing saying

Reason: Error while calling cudaMalloc(&data, n) in file D:\FAPP\dlib-master\dlib\dnn\cuda_data_ptr.cpp:28. code: 2, reason: out of memory

(images are same size)

any idea why this happening?

NagashSzarekh commented 6 years ago

I am seeing the same on a few of my image sets. I ran an extract on 200 images last night with cnn and when I tried to run those same images today to test the new extractor it gave the OOM error. I assume this is because the new extractor is using more video memory.

oatssss commented 6 years ago

does hog still work for you?

3xtr3m3d commented 6 years ago

yes hog works also face_alignment from pytorch also work

NagashSzarekh commented 6 years ago

Yes hog works. I wonder if it would be worth while to see if instead of just erroring out when getting a OOM error with cnn to switch over to hog for that file just like now how if it does not find a face with cnn it tries with hog.

oatssss commented 6 years ago

@DLSauron The OOM is an all or nothing thing I believe. Have you been able to extract part-way through and then an OOM occurs?

@3xtr3m3d I don't understand, isn't the face-alignment port from pytorch the "today's update" that you're referring to? For what extractor are you getting an OOM?

NagashSzarekh commented 6 years ago

Yes it appears to depend on the resolution of the image as you can see below it was able to get though 5 images but when it one that was 857 x 1280 and errored out. Funny thing is I had already resized that image yesterday because the original was to big for the original cnn extractor (2678 x 4000).

Fun fact file size does not appear to matter to cnn only resolution. The original one was 580 KB, but the smaller one is 1.90 MB. With the original cnn extractor the 580 KB would give a OOM, but the 1.90 MB would extract just fine.

2%|██▊ | 5/282 [00:17<15:54, 3.44s/it]Failed to extract from image: D:\Fakes\Data\Test_A\1363877250141_resized.jpg. Reason: Error while calling cudaMalloc(&data, new_size*sizeof(float)) in file C:\Users\DLSauron\AppData\Local\Temp\pip-build-8u_e3rm6\dlib\dlib\dnn\gpu_data.cpp:195. code: 2, reason: out of memory Writing alignments to: D:\Fakes\Data\Test_A\alignments.json

Images found: 282 Faces detected: 5

Done!

babilio commented 6 years ago

@oatssss the face-alignment from today is from the port to Keras I believe, no longer pytorch. This commit 232d931

torzdf commented 6 years ago

My experience with face-alignment (pytorch version) is that it has issues with any images over 720p. No rigorous testing, mind, just I have problems with bigger images, but if I resize down, all problems go away. I'd guess that this 'issue'. has pulled through. Try resizing down a bit and trying again.

oatssss commented 6 years ago

@DLSauron ah I assumed you were converting images all of the same size. Yea I think this is just a limitation of the library/resources. If you weren't having problems with face_recognition (not the new face-alignment) maybe we can have all 3 (hog, face_recognition, and face-alignment) available as options. Did face_recognition's cnn work well for you? Last I remember, it was extremely slow for me.

We can also work in a technique where images are scaled down before passing to the extractor. Then the alignment coords that are found can just be rescaled up to match the original.

Jack29913 commented 6 years ago

I think scaling down first then extracting and cropping the face from original image is the ideal solution. Just be careful about cropping the face from scaled down version. Final image would be less quality. This doesn't extract only crops but I think it scales down first.

NagashSzarekh commented 6 years ago

I did not really have any problems with the face_recognition, at least not that I noticed. Both face_recognition and face-alignment appear to run at the same speed for me, but that may be a limitation of my hardware (GTX 980).

I just figured if it was possible to make if it did not find an image with cnn that would try hog then it would also be possible that if cnn errored out to try with hog and not just exit, but I am not a python programmer and you all would have a better idea what is possible to do in the code.

iperov commented 6 years ago

@DLSauron

I did not really have any problems with the face_recognition

face_recognition has zooming problem on some footage. [Image Removed]

babilio commented 6 years ago

@iperov @oatssss I tried extracting extract the same set of images on the face-alignment on pytorch and the face-alignment on keras. It seems that keras implementation is much more demanding on GPU memory. I have 7,000 images of 480, 720, 1080 resolution. Pytorch went through all of them fine, keras went through 800 images of 480 resolution and threw error

Reason: Error while calling cudaMalloc(&data, new_size*sizeof(float)) in file C:\packages\dlib-19.9\dlib\dnn\gpu_data.cpp:195. code: 2, reason: out of memory/

and couldn't even handle the others

iperov commented 6 years ago

@babilio Actually this is dlib OOM. DLIB conflicts with Keras in memory usage.

What picture size of your first image of sequence in a folder ? For test, try set first image of sequence 1080p size (highest in all sequence) and report here.

babilio commented 6 years ago

@iperov the picture I had in the folder first was 480p size.

I did what you asked and put the 1080p pics first and it went through all 1,000, and moved on to the 720p without any issue. So the problem does seem to be when it starts with low resolution and switches to higher.

iperov commented 6 years ago

@DLSauron

Funny thing is I had already resized that image yesterday because the original was to big for the original cnn extractor (2678 x 4000).

1080p picture eats ~3.5GB videoram by dlib cnn BEFORE keras loading anyway large pictures cannot be handled by dlib cnn.

NagashSzarekh commented 6 years ago

I can confirm that after I created a pure white JPG with the dimensions of 1280 x 1280 and named it 0.jpg it was able to process the entire folder

iperov commented 6 years ago

working on fix

3xtr3m3d commented 6 years ago

@DLSauron

I can confirm that after I created a pure white JPG with the dimensions of 1280 x 1280 and named it 0.jpg it was able to process the entire folder

i tried this and it processed my folder with 100 images which it couldn't process before

3xtr3m3d commented 6 years ago

@oatssss

@3xtr3m3d I don't understand, isn't the face-alignment port from pytorch the "today's update" that you're referring to? For what extractor are you getting an OOM?

face_alignment i was refering to is the original one from pytorch which i integrate to code as describe in the comment of LordVulkan on issue https://github.com/deepfakes/faceswap/issues/187

that able to process the folder but ported version did not.

iperov commented 6 years ago

I made scale image for cnns with max_res_side param. This scaling affects only input image to cnns. Output points scaling back to original size.

Also I do first call dlib_cnn_face_detector with max_res_side x max_res_side x 3 then dlib consume all necessary vram for work.

dlib_cnn_face_detector = dlib.cnn_face_detection_model_v1(dlib_cnn_face_detector_path)            
dlib_cnn_face_detector ( np.zeros ( (max_res_side, max_res_side, 3), dtype=np.float32), 1 )

I have ~ 5.53Gb free before program start and with param max_res_side=1850 I got totalMemory: 6.00GiB freeMemory: 133.42MiB

but even with 133Mb keras works without problem, but with warning

Allocator (GPU_0_bfc) ran out of memory trying to allocate 134.44MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

Yes DLIB sucks, but we have no alternative.

SO

max_res_side=1280 consumes ~2,77Gb . This will fail at ppl who have only 3GB VRAM. Because they have ~2.2Gb free on Windows 10.

max_res_side=1100 consumes ~2.06Gb and will work at ppl with 3GB VRAM.

But decreasing max_res_side may cause unprecise landmarks detecting.

SO what we choose ? @Clorr

iperov commented 6 years ago

sry bad english, I will try to explain.

@3xtr3m3d
face_alignment i was refering to is the original one from pytorch which i integrate to code as describe in the comment of LordVulkan on issue https://github.com/deepfakes/faceswap/issues/187

that able to process the folder but ported version did not.

because Torch frees vram after call. But TensorFlow doesnt free, and consumes all possible vram for caching, therefore TF x2 faster than Torch.

Problem is DLIB and TF competiting for VRAM. Difference is TF can work with super low mem, but eats all freed mem again.

For example If call TF first, it consumes all ram, and then call dlib - there is no ram for DLIB. If we call DLIB first with 1280x1280, free mem is 2Gb, then call TF it eats all remaining ram, then call dlib with 1920x1920 - no ram for dlib and OOM error, only 1280x1280 will fine.

So I suggested fix in prev post.

3xtr3m3d commented 6 years ago

@iperov Great.. Thanks for the explanation.

deepfakesclub commented 6 years ago

I think in the other thread some mentioned having a plugin to choose face-alignment or face_recognition... I would recommend that route as face_recognition has some advantages in edge case scenarios.

3xtr3m3d commented 6 years ago

@iperov since problem is tensorflow allocationg all the vram. i tried by limiting tensorflow memory. now i don't see the oom error but it looks like gpu is using only 2.4Gig of 4Gig. maybe problem can be resolve this way? im not familiar with tensorflow keras etc..

the thing i tried is set the memory limit before import keras

FaceLandmarksExtractor.py

import tensorflow as tf from keras.backend.tensorflow_backend import set_session config = tf.ConfigProto() config.gpu_options.per_process_gpu_memory_fraction = 0.3 set_session(tf.Session(config=config)) import keras from keras import backend as K

iperov commented 6 years ago

@3xtr3m3d what about performance with reduced memory ?

iperov commented 6 years ago

and I dont like to control tf session inside child lib. FaceSwap architecture crap already :D

3xtr3m3d commented 6 years ago

@iperov

https://imgur.com/mN4RD1x

iperov commented 6 years ago

PR fix https://github.com/deepfakes/faceswap/pull/235