Closed salre9501 closed 6 years ago
I did some braindead profiling and the issue seems to stem from the line new_face = self.encoder( face / 255.0 )[0]
in get_new_face()
in Convert_Masked.py
. I don't know enough about how the software is structured to say if it's meant to use GPU acceleration or not, but it seems to often-times just hang for no reason, regardless of how many faces (if any) were detected and had landmarks extracted. Face detection and landmark extraction is fast, on the order of milliseconds. My bet is there's some infinite loop shenanigans going on in one of the yield statements.
50 min for 3000 - faster than my gtx1060. My speed is 30 min for 1500
When I use CNN do not use my GPU, only CPU usage, I have a gtx1060 wtih 6gb and take me 30min for 100 images. But when I train use all my GPU memory. There is some error related to CNN.
@ByFede its just your dlib not built with DLIB_USE_CUDA
@iperov Yes it is, I install it manually and checked with tensorflow, but I dont have clear how to check it with CNN. Dont know what its wrong.
Test dlib
:
cmake -G "Visual Studio 14 2015 Win64"
-- Selecting Windows SDK version to target Windows 10.0.16299. -- The C compiler identification is MSVC 19.0.24215.1 -- The CXX compiler identification is MSVC 19.0.24215.1 -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of void -- Check size of void - done -- Enabling SSE2 instructions -- Searching for BLAS and LAPACK -- Searching for BLAS and LAPACK -- Looking for pthread.h -- Looking for pthread.h - not found -- Found Threads: TRUE -- A library with BLAS API not found. Please specify library location. -- LAPACK requires BLAS -- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0 (found suitable version "8.0", minimum required is "7.5") -- Looking for cuDNN install... -- Found cuDNN: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cudnn.lib -- Building a CUDA test project to see if your compiler is compatible with CUDA... -- Checking if you have the right version of cuDNN installed. -- Enabling CUDA support for dlib. DLIB WILL USE CUDA -- C++11 activated. -- Configuring done -- Generating done -- Build files have been written to: D:/fy/fs/faceswap_env/dlib-master
Also I have chequed in tensorflow if GPU is enable:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2018-02-11 03:27:20.363564: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX 2018-02-11 03:27:20.747300: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845 pciBusID: 0000:02:00.0 totalMemory: 6.00GiB freeMemory: 4.97GiB 2018-02-11 03:27:20.747460: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1) Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1 2018-02-11 03:27:20.933639: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\direct_session.cc:299] Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1
check run python
import dlib
dlib.DLIB_USE_CUDA
what output?
@iperov Said False, but when I run the Train use my 6GB GPU Memory at 40/50% usage.
What's wrong with my instalation? I've done all exactly described twice.
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dlib
>>> dlib.DLIB_USE_CUDA
False
>>>
@ByFede
extract using dlib train using tensorflow convert using dlib and tensorflow
in latest commit convert no more using dlib bcuz use alignments got from extract, so you have to run extract again. also your dlib built without CUDA.
pip uninstall dlib goto dlib dir python setup.py install --yes DLIB_USE_CUDA
@iperov
(base) C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>python cuda.py
Traceback (most recent call last):
File "cuda.py", line 2, in <module>
dlib.DLIB_USE_CUDA
AttributeError: module 'dlib' has no attribute 'DLIB_USE_CUDA'
@iperov It's wrong to use setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA
?
nothing wrong
@iperov Well, i used that for install DLIB, but convert doesn't use the GPU.
Can you tell how run that .py check you said before...
I did a new .py file called cuda y put inside the code:
import dlib
dlib.DLIB_USE_CUDA
Then i didcd C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>python cuda.py
and excecute python cuda.py
,
I excecute it in my BASE env and said this:
(base) C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>python cuda.py
Traceback (most recent call last):
File "cuda.py", line 2, in <module>
dlib.DLIB_USE_CUDA
AttributeError: module 'dlib' has no attribute 'DLIB_USE_CUDA'
If i ejecute it on the faceswap ENVS, it doen't show me anything.
(deepfakes) C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>python cuda.py
(deepfakes) C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>
conda list
packages in environment at C:\ProgramData\Anaconda3\envs\deepfakes:
Name Version Build Channel
absl-py 0.1.10 <pip>
backports 1.0 py36h81696a8_1
backports.weakref 1.0rc1 py36_0
bleach 1.5.0 py36_0 conda-forge
boost 1.64.0 py36_vc14_4 [vc14] conda-forge
boost-cpp 1.64.0 vc14_1 [vc14] conda-forge
bzip2 1.0.6 vc14_1 [vc14] conda-forge
ca-certificates 2017.08.26 h94faf87_0
certifi 2018.1.18 py36_0
click 6.7 <pip>
cmake 3.9.4 h4b83b1b_0 anaconda
cudatoolkit 8.0 3 anaconda
cudnn 6.0 0 anaconda
decorator 4.0.11 py36_0 conda-forge
**dlib 19.9.99 <pip>**
face-recognition 1.2.1 <pip>
face-recognition-models 0.3.0 <pip>
ffmpeg 3.4.1 1 conda-forge
freetype 2.8.1 vc14_0 [vc14] conda-forge
h5py 2.7.1 py36_2 conda-forge
hdf5 1.10.1 vc14_1 [vc14] conda-forge
html5lib 0.9999999 py36_0 conda-forge
icc_rt 2017.0.4 h97af966_0
icu 58.2 vc14_0 [vc14] conda-forge
imageio 2.1.2 py36_0 conda-forge
intel-openmp 2018.0.0 hd92c6cd_8
jpeg 9b vc14_2 [vc14] conda-forge
keras 2.0.9 py36_0 conda-forge
libgpuarray 0.7.5 vc14_0 [vc14] conda-forge
libiconv 1.14 vc14_4 [vc14] conda-forge
libpng 1.6.34 vc14_0 [vc14] conda-forge
libprotobuf 3.2.0 vc14_0 [vc14] anaconda
libtiff 4.0.9 vc14_0 [vc14] conda-forge
libwebp 0.5.2 vc14_7 [vc14] conda-forge
libxml2 2.9.3 vc14_9 [vc14] conda-forge
mako 1.0.7 py36_0 conda-forge
markdown 2.6.9 py36_0 conda-forge
Markdown 2.6.11 <pip>
markupsafe 1.0 py36_0 conda-forge
mkl 2018.0.1 h2108138_4
moviepy 0.2.3.2 py36_0 conda-forge
numpy 1.12.1 py36hf30b8aa_1 anaconda
numpy 1.14.0 <pip>
olefile 0.44 py36_0 conda-forge
opencv 3.3.0 py36_200 conda-forge
openssl 1.0.2n h74b6da3_0
pillow 5.0.0 py36_0 conda-forge
pip 9.0.1 py36_1 conda-forge
protobuf 3.5.1 py36_vc14_3 [vc14] conda-forge
protobuf 3.5.1 <pip>
pygpu 0.7.5 py36_0 conda-forge
python 3.6.4 0 conda-forge
pyyaml 3.12 py36_1 conda-forge
qt 5.6.2 vc14_1 [vc14] conda-forge
scandir 1.6 py36_0 conda-forge
scipy 1.0.0 py36h1260518_0
setuptools 38.5.1 <pip>
setuptools 38.4.0 py36_0 conda-forge
six 1.11.0 py36_1 conda-forge
six 1.11.0 <pip>
sqlite 3.20.1 vc14_2 [vc14] conda-forge
tensorflow-gpu 1.5.0 <pip>
tensorflow-tensorboard 1.5.1 <pip>
theano 1.0.1 py36_1 conda-forge
tk 8.6.7 vc14_0 [vc14] conda-forge
tqdm 4.11.2 py36_0 conda-forge
vc 14 0 conda-forge
vs2015_runtime 14.0.25420 0 conda-forge
webencodings 0.5 py36_0 conda-forge
Werkzeug 0.14.1 <pip>
werkzeug 0.14.1 py_0 conda-forge
wheel 0.30.0 <pip>
wheel 0.30.0 py36_2 conda-forge
wincertstore 0.2 py36_0 conda-forge
yaml 0.1.7 vc14_0 [vc14] conda-forge
zlib 1.2.11 vc14_0 [vc14] conda-forge
@ZeroCool22 as interpreter: run python.exe import dlib dlib.DLIB_USE_CUDA
from .py :
print(dlib.DLIB_USE_CUDA)
@iperov It´s working! Thanks dude.
In all my tries never do a pip uninstall dlib
inside the Enviroment before install dlib
with CUDA flag, but now with dlib uninstalled work perfect, 500 images in a few seconds :).
I was a little confuse about the Enviroment, this is what I was do if it helps someone:
pip uninstall dlib
-- Download last dlib version from https://github.com/davisking/dlib/archive/master.zip
-- dlib dir unzipped and python setup.py install --yes DLIB_USE_CUDA
dlib check (Inside the Enviroment):
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dlib
>>> dlib.DLIB_USE_CUDA
True
>>>
tensorflow check (Inside the Enviroment):
>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2018-02-12 03:42:56.921157: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2018-02-12 03:42:57.291935: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:02:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2018-02-12 03:42:57.292165: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1
2018-02-12 03:42:57.624743: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1
I built with cuDNN 6 ok.
my trick how to build dlib with installed both 2015 and 2017: is edit dlib\setup.py:
. . .
cmake_args = ['-DCMAKE_GENERATOR=Visual Studio 14 2015',
. . .
@iperov
(deepfakes) C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>python cuda.py
Traceback (most recent call last):
File "cuda.py", line 1, in
@iperov Now yes:
@iperov Found the FAIL while installing DLIB:
How i can FIX it?
full log
@iperov If want the cuda_test from C:\Users\ZeroCool22\Desktop\dlib ultimo chico\dlib-master\build\temp.win-amd64-3.6\Release\dlib_build\cuda_test_build
i can give it yo you.
But the LOG with installing DLIB doesn't show up FULL on the Anaconda terminal...
i dont know what to say without log.
DLIB INSTALLATION LOG: https://drive.google.com/open?id=15TKqWINR8twQH7LiPSapT89JmGBGtj9-
CUDA TEST: https://drive.google.com/open?id=1NdzfFpC5KsOZCnI2M4nRGltq-Q-kB36z
lol, sorry fixed.
I don't know if you needed that, that's the patch that appears in the Anaconda prompt: C:\Users\ZeroCool22\Desktop\DLIB ULTIMO CHICO\dlib-master\build\temp.win-amd64-3.6\Release\dlib_build\cuda_test_build
CUDA TEST: - cmakecache ?? wrong file in google drive
Hmm I'm also having this problem.
I reinstalled dlib using the methods above, and my dlib.DLIB_USE_CUDA returns true. My tensorflow also can detect the gpu, so that isn't the problem. Using python 3.5
So when I run the convert (-D hog), it doesn't use GPU at all. Or is the (-D cnn) only one using the gpu?
I tried running the -D cnn option but then the dlib fails with
Failed to convert image: Reason: Error while calling cudaMalloc(&data, new_size*sizeof(float)) in file: reason: out of memory
I have a gtx 1080 gpu.
Any hints on what is causing this?
@Arthil are u using latest faceswap repo with serializer?
@ZeroCool22 try latest dlib with
python setup.py install --yes DLIB_USE_CUDA -G "Visual Studio 14 2015"
@iperov ahh looks like I forgot to pull the latest updates!
It's complaining about a alignment.json file now, how do I generate this file?
restart extract
Ok so I did
import dlib dlib.DLIB_USE_CUDA
and it says true. I create an alignments.json after pulling the update and extracting again. I tried converting and I get a little faster speed, but my GPU usage still zero (except for the vram). Now I'm getting 50 minutes for 5402 images so it was an improvement. I'm just not sure if these are normal speeds.
dlib version 19.9.99, tensorflow-gpu 1.5, CUDA 9, cudnn 7
@salre9501 5402 for 50 min ????? superior result. I have only ~3000 per hour. I also have low GPU usage on convert, I think its normal.
Ok I think I fixed it. Removing seamless -D cnn is giving me speeds of 5.5 it/s. Now the same 5400 images in 15 minutes
@iperov
(deepfakes) C:\Users\ZeroCool22\Desktop\dlib ultimo chico\dlib-master>python setup.py install --yes DLIB_USE_CUDA -G "Visual Studio 14 2015"
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
error: option -G not recognized
(deepfakes) C:\Users\ZeroCool22\Desktop\dlib ultimo chico\dlib-master>
I experience the same behaviour like mentioned here https://github.com/deepfakes/faceswap/issues/184#issuecomment-364935845 When "seamless" is used, there's no gpu usage and the process of conversion runs extremly slow. With "seamless" disabled it works like a charm.
Dlib selfcompiled with "--yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA" Face-Recognition with cnn works great (that didn't work well without selfcompiled dlib)
Results are good, but the conversion is really slow compared to fake app. Dlib 19.9.99 compiled with CUDA, visual studio reinstalled, tensorflow 1.5, CUDA 9, and cudnn 7. I tried with CUDA 8, cudnn 6/5.1, tensorflow 1.4 and had the same problem. Extraction took 8 minutes on the same images, and training uses 60% of my gpu so I know the gpu is detected. Setting -D cnn on the conversion scripts gives me an out of memory error. Getting a conversion speed of 1 - 1.8 it/s on GTX 1070.
Images are 1280 x 720