Closed 3xtr3m3d closed 6 years ago
I am seeing the same on a few of my image sets. I ran an extract on 200 images last night with cnn and when I tried to run those same images today to test the new extractor it gave the OOM error. I assume this is because the new extractor is using more video memory.
does hog
still work for you?
yes hog works also face_alignment from pytorch also work
Yes hog works. I wonder if it would be worth while to see if instead of just erroring out when getting a OOM error with cnn to switch over to hog for that file just like now how if it does not find a face with cnn it tries with hog.
@DLSauron The OOM is an all or nothing thing I believe. Have you been able to extract part-way through and then an OOM occurs?
@3xtr3m3d I don't understand, isn't the face-alignment port from pytorch the "today's update" that you're referring to? For what extractor are you getting an OOM?
Yes it appears to depend on the resolution of the image as you can see below it was able to get though 5 images but when it one that was 857 x 1280 and errored out. Funny thing is I had already resized that image yesterday because the original was to big for the original cnn extractor (2678 x 4000).
Fun fact file size does not appear to matter to cnn only resolution. The original one was 580 KB, but the smaller one is 1.90 MB. With the original cnn extractor the 580 KB would give a OOM, but the 1.90 MB would extract just fine.
Done!
@oatssss the face-alignment from today is from the port to Keras I believe, no longer pytorch. This commit 232d931
My experience with face-alignment (pytorch version) is that it has issues with any images over 720p. No rigorous testing, mind, just I have problems with bigger images, but if I resize down, all problems go away. I'd guess that this 'issue'. has pulled through. Try resizing down a bit and trying again.
@DLSauron ah I assumed you were converting images all of the same size. Yea I think this is just a limitation of the library/resources. If you weren't having problems with face_recognition (not the new face-alignment) maybe we can have all 3 (hog, face_recognition, and face-alignment) available as options. Did face_recognition's cnn work well for you? Last I remember, it was extremely slow for me.
We can also work in a technique where images are scaled down before passing to the extractor. Then the alignment coords that are found can just be rescaled up to match the original.
I think scaling down first then extracting and cropping the face from original image is the ideal solution. Just be careful about cropping the face from scaled down version. Final image would be less quality. This doesn't extract only crops but I think it scales down first.
I did not really have any problems with the face_recognition, at least not that I noticed. Both face_recognition and face-alignment appear to run at the same speed for me, but that may be a limitation of my hardware (GTX 980).
I just figured if it was possible to make if it did not find an image with cnn that would try hog then it would also be possible that if cnn errored out to try with hog and not just exit, but I am not a python programmer and you all would have a better idea what is possible to do in the code.
@DLSauron
I did not really have any problems with the face_recognition
face_recognition has zooming problem on some footage. [Image Removed]
@iperov @oatssss I tried extracting extract the same set of images on the face-alignment on pytorch and the face-alignment on keras. It seems that keras implementation is much more demanding on GPU memory. I have 7,000 images of 480, 720, 1080 resolution. Pytorch went through all of them fine, keras went through 800 images of 480 resolution and threw error
Reason: Error while calling cudaMalloc(&data, new_size*sizeof(float)) in file C:\packages\dlib-19.9\dlib\dnn\gpu_data.cpp:195. code: 2, reason: out of memory/
and couldn't even handle the others
@babilio Actually this is dlib OOM. DLIB conflicts with Keras in memory usage.
What picture size of your first image of sequence in a folder ? For test, try set first image of sequence 1080p size (highest in all sequence) and report here.
@iperov the picture I had in the folder first was 480p size.
I did what you asked and put the 1080p pics first and it went through all 1,000, and moved on to the 720p without any issue. So the problem does seem to be when it starts with low resolution and switches to higher.
@DLSauron
Funny thing is I had already resized that image yesterday because the original was to big for the original cnn extractor (2678 x 4000).
1080p picture eats ~3.5GB videoram by dlib cnn BEFORE keras loading anyway large pictures cannot be handled by dlib cnn.
I can confirm that after I created a pure white JPG with the dimensions of 1280 x 1280 and named it 0.jpg it was able to process the entire folder
working on fix
@DLSauron
I can confirm that after I created a pure white JPG with the dimensions of 1280 x 1280 and named it 0.jpg it was able to process the entire folder
i tried this and it processed my folder with 100 images which it couldn't process before
@oatssss
@3xtr3m3d I don't understand, isn't the face-alignment port from pytorch the "today's update" that you're referring to? For what extractor are you getting an OOM?
face_alignment i was refering to is the original one from pytorch which i integrate to code as describe in the comment of LordVulkan on issue https://github.com/deepfakes/faceswap/issues/187
that able to process the folder but ported version did not.
I made scale image for cnns with max_res_side
param. This scaling affects only input image to cnns. Output points scaling back to original size.
Also I do first call dlib_cnn_face_detector
with max_res_side x max_res_side x 3
then dlib consume all necessary vram for work.
dlib_cnn_face_detector = dlib.cnn_face_detection_model_v1(dlib_cnn_face_detector_path)
dlib_cnn_face_detector ( np.zeros ( (max_res_side, max_res_side, 3), dtype=np.float32), 1 )
I have ~ 5.53Gb free before program start
and with param
max_res_side=1850
I got
totalMemory: 6.00GiB freeMemory: 133.42MiB
but even with 133Mb keras works without problem, but with warning
Allocator (GPU_0_bfc) ran out of memory trying to allocate 134.44MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
Yes DLIB sucks, but we have no alternative.
SO
max_res_side=1280
consumes ~2,77Gb . This will fail at ppl who have only 3GB VRAM. Because they have ~2.2Gb free on Windows 10.
max_res_side=1100
consumes ~2.06Gb and will work at ppl with 3GB VRAM.
But decreasing max_res_side
may cause unprecise landmarks detecting.
SO what we choose ? @Clorr
sry bad english, I will try to explain.
@3xtr3m3d
face_alignment i was refering to is the original one from pytorch which i integrate to code as describe in the comment of LordVulkan on issue https://github.com/deepfakes/faceswap/issues/187
that able to process the folder but ported version did not.
because Torch frees vram after call. But TensorFlow doesnt free, and consumes all possible vram for caching, therefore TF x2 faster than Torch.
Problem is DLIB and TF competiting for VRAM. Difference is TF can work with super low mem, but eats all freed mem again.
For example If call TF first, it consumes all ram, and then call dlib - there is no ram for DLIB. If we call DLIB first with 1280x1280, free mem is 2Gb, then call TF it eats all remaining ram, then call dlib with 1920x1920 - no ram for dlib and OOM error, only 1280x1280 will fine.
So I suggested fix in prev post.
@iperov Great.. Thanks for the explanation.
I think in the other thread some mentioned having a plugin to choose face-alignment or face_recognition... I would recommend that route as face_recognition has some advantages in edge case scenarios.
@iperov since problem is tensorflow allocationg all the vram. i tried by limiting tensorflow memory. now i don't see the oom error but it looks like gpu is using only 2.4Gig of 4Gig. maybe problem can be resolve this way? im not familiar with tensorflow keras etc..
the thing i tried is set the memory limit before import keras
FaceLandmarksExtractor.py
import tensorflow as tf from keras.backend.tensorflow_backend import set_session config = tf.ConfigProto() config.gpu_options.per_process_gpu_memory_fraction = 0.3 set_session(tf.Session(config=config)) import keras from keras import backend as K
@3xtr3m3d what about performance with reduced memory ?
and I dont like to control tf session inside child lib. FaceSwap architecture crap already :D
@iperov
I have extracted faces from 2000+ images after today update and it went great. then i tried to extract another set of 100 images and program failing saying
Reason: Error while calling cudaMalloc(&data, n) in file D:\FAPP\dlib-master\dlib\dnn\cuda_data_ptr.cpp:28. code: 2, reason: out of memory
(images are same size)
any idea why this happening?