Getting 50 minutes conversion time for 3000 images, 0% GPU usage but high vram usage. Is this normal?

salre9501 commented 6 years ago

Results are good, but the conversion is really slow compared to fake app. Dlib 19.9.99 compiled with CUDA, visual studio reinstalled, tensorflow 1.5, CUDA 9, and cudnn 7. I tried with CUDA 8, cudnn 6/5.1, tensorflow 1.4 and had the same problem. Extraction took 8 minutes on the same images, and training uses 60% of my gpu so I know the gpu is detected. Setting -D cnn on the conversion scripts gives me an out of memory error. Getting a conversion speed of 1 - 1.8 it/s on GTX 1070.

Images are 1280 x 720

ByFede commented 6 years ago

Tyrannosaurus1234 commented 6 years ago

I did some braindead profiling and the issue seems to stem from the line new_face = self.encoder( face / 255.0 )[0] in get_new_face() in Convert_Masked.py. I don't know enough about how the software is structured to say if it's meant to use GPU acceleration or not, but it seems to often-times just hang for no reason, regardless of how many faces (if any) were detected and had landmarks extracted. Face detection and landmark extraction is fast, on the order of milliseconds. My bet is there's some infinite loop shenanigans going on in one of the yield statements.

iperov commented 6 years ago

50 min for 3000 - faster than my gtx1060. My speed is 30 min for 1500

ByFede commented 6 years ago

When I use CNN do not use my GPU, only CPU usage, I have a gtx1060 wtih 6gb and take me 30min for 100 images. But when I train use all my GPU memory. There is some error related to CNN.

iperov commented 6 years ago

@ByFede its just your dlib not built with DLIB_USE_CUDA

ByFede commented 6 years ago

@iperov Yes it is, I install it manually and checked with tensorflow, but I dont have clear how to check it with CNN. Dont know what its wrong.

Test dlib: cmake -G "Visual Studio 14 2015 Win64"

-- Selecting Windows SDK version to target Windows 10.0.16299. -- The C compiler identification is MSVC 19.0.24215.1 -- The CXX compiler identification is MSVC 19.0.24215.1 -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of void -- Check size of void - done -- Enabling SSE2 instructions -- Searching for BLAS and LAPACK -- Searching for BLAS and LAPACK -- Looking for pthread.h -- Looking for pthread.h - not found -- Found Threads: TRUE -- A library with BLAS API not found. Please specify library location. -- LAPACK requires BLAS -- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0 (found suitable version "8.0", minimum required is "7.5") -- Looking for cuDNN install... -- Found cuDNN: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cudnn.lib -- Building a CUDA test project to see if your compiler is compatible with CUDA... -- Checking if you have the right version of cuDNN installed. -- Enabling CUDA support for dlib. DLIB WILL USE CUDA -- C++11 activated. -- Configuring done -- Generating done -- Build files have been written to: D:/fy/fs/faceswap_env/dlib-master

Also I have chequed in tensorflow if GPU is enable: sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

2018-02-11 03:27:20.363564: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX 2018-02-11 03:27:20.747300: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845 pciBusID: 0000:02:00.0 totalMemory: 6.00GiB freeMemory: 4.97GiB 2018-02-11 03:27:20.747460: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1) Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1 2018-02-11 03:27:20.933639: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\direct_session.cc:299] Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1

iperov commented 6 years ago

check run python

import dlib
dlib.DLIB_USE_CUDA

what output?

ByFede commented 6 years ago

@iperov Said False, but when I run the Train use my 6GB GPU Memory at 40/50% usage.

What's wrong with my instalation? I've done all exactly described twice.

Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dlib
>>> dlib.DLIB_USE_CUDA
False
>>>

iperov commented 6 years ago

@ByFede

extract using dlib train using tensorflow convert using dlib and tensorflow

in latest commit convert no more using dlib bcuz use alignments got from extract, so you have to run extract again. also your dlib built without CUDA.

pip uninstall dlib goto dlib dir python setup.py install --yes DLIB_USE_CUDA

ZeroCool22 commented 6 years ago

@iperov

(base) C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>python cuda.py
Traceback (most recent call last):
  File "cuda.py", line 2, in <module>
    dlib.DLIB_USE_CUDA
AttributeError: module 'dlib' has no attribute 'DLIB_USE_CUDA'

ZeroCool22 commented 6 years ago

@iperov It's wrong to use setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA?

iperov commented 6 years ago

nothing wrong

ZeroCool22 commented 6 years ago

@iperov Well, i used that for install DLIB, but convert doesn't use the GPU.

Can you tell how run that .py check you said before...

I did a new .py file called cuda y put inside the code:

import dlib
dlib.DLIB_USE_CUDA

Then i didcd C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>python cuda.py and excecute python cuda.py,

I excecute it in my BASE env and said this:

(base) C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>python cuda.py
Traceback (most recent call last):
  File "cuda.py", line 2, in <module>
    dlib.DLIB_USE_CUDA
AttributeError: module 'dlib' has no attribute 'DLIB_USE_CUDA'

If i ejecute it on the faceswap ENVS, it doen't show me anything.

(deepfakes) C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>python cuda.py

(deepfakes) C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>

ZeroCool22 commented 6 years ago

conda list

packages in environment at C:\ProgramData\Anaconda3\envs\deepfakes:

 Name                    Version                   Build  Channel
absl-py                   0.1.10                    <pip>
backports                 1.0              py36h81696a8_1
backports.weakref         1.0rc1                   py36_0
bleach                    1.5.0                    py36_0    conda-forge
boost                     1.64.0              py36_vc14_4  [vc14]  conda-forge
boost-cpp                 1.64.0                   vc14_1  [vc14]  conda-forge
bzip2                     1.0.6                    vc14_1  [vc14]  conda-forge
ca-certificates           2017.08.26           h94faf87_0
certifi                   2018.1.18                py36_0
click                     6.7                       <pip>
cmake                     3.9.4                h4b83b1b_0    anaconda
cudatoolkit               8.0                           3    anaconda
cudnn                     6.0                           0    anaconda
decorator                 4.0.11                   py36_0    conda-forge
**dlib                      19.9.99                   <pip>**
face-recognition          1.2.1                     <pip>
face-recognition-models   0.3.0                     <pip>
ffmpeg                    3.4.1                         1    conda-forge
freetype                  2.8.1                    vc14_0  [vc14]  conda-forge
h5py                      2.7.1                    py36_2    conda-forge
hdf5                      1.10.1                   vc14_1  [vc14]  conda-forge
html5lib                  0.9999999                py36_0    conda-forge
icc_rt                    2017.0.4             h97af966_0
icu                       58.2                     vc14_0  [vc14]  conda-forge
imageio                   2.1.2                    py36_0    conda-forge
intel-openmp              2018.0.0             hd92c6cd_8
jpeg                      9b                       vc14_2  [vc14]  conda-forge
keras                     2.0.9                    py36_0    conda-forge
libgpuarray               0.7.5                    vc14_0  [vc14]  conda-forge
libiconv                  1.14                     vc14_4  [vc14]  conda-forge
libpng                    1.6.34                   vc14_0  [vc14]  conda-forge
libprotobuf               3.2.0                    vc14_0  [vc14]  anaconda
libtiff                   4.0.9                    vc14_0  [vc14]  conda-forge
libwebp                   0.5.2                    vc14_7  [vc14]  conda-forge
libxml2                   2.9.3                    vc14_9  [vc14]  conda-forge
mako                      1.0.7                    py36_0    conda-forge
markdown                  2.6.9                    py36_0    conda-forge
Markdown                  2.6.11                    <pip>
markupsafe                1.0                      py36_0    conda-forge
mkl                       2018.0.1             h2108138_4
moviepy                   0.2.3.2                  py36_0    conda-forge
numpy                     1.12.1           py36hf30b8aa_1    anaconda
numpy                     1.14.0                    <pip>
olefile                   0.44                     py36_0    conda-forge
opencv                    3.3.0                  py36_200    conda-forge
openssl                   1.0.2n               h74b6da3_0
pillow                    5.0.0                    py36_0    conda-forge
pip                       9.0.1                    py36_1    conda-forge
protobuf                  3.5.1               py36_vc14_3  [vc14]  conda-forge
protobuf                  3.5.1                     <pip>
pygpu                     0.7.5                    py36_0    conda-forge
python                    3.6.4                         0    conda-forge
pyyaml                    3.12                     py36_1    conda-forge
qt                        5.6.2                    vc14_1  [vc14]  conda-forge
scandir                   1.6                      py36_0    conda-forge
scipy                     1.0.0            py36h1260518_0
setuptools                38.5.1                    <pip>
setuptools                38.4.0                   py36_0    conda-forge
six                       1.11.0                   py36_1    conda-forge
six                       1.11.0                    <pip>
sqlite                    3.20.1                   vc14_2  [vc14]  conda-forge
tensorflow-gpu            1.5.0                     <pip>
tensorflow-tensorboard    1.5.1                     <pip>
theano                    1.0.1                    py36_1    conda-forge
tk                        8.6.7                    vc14_0  [vc14]  conda-forge
tqdm                      4.11.2                   py36_0    conda-forge
vc                        14                            0    conda-forge
vs2015_runtime            14.0.25420                    0    conda-forge
webencodings              0.5                      py36_0    conda-forge
Werkzeug                  0.14.1                    <pip>
werkzeug                  0.14.1                     py_0    conda-forge
wheel                     0.30.0                    <pip>
wheel                     0.30.0                   py36_2    conda-forge
wincertstore              0.2                      py36_0    conda-forge
yaml                      0.1.7                    vc14_0  [vc14]  conda-forge
zlib                      1.2.11                   vc14_0  [vc14]  conda-forge

iperov commented 6 years ago

@ZeroCool22 as interpreter: run python.exe import dlib dlib.DLIB_USE_CUDA

from .py :

print(dlib.DLIB_USE_CUDA)

ByFede commented 6 years ago

@iperov It´s working! Thanks dude.

In all my tries never do a pip uninstall dlib inside the Enviroment before install dlib with CUDA flag, but now with dlib uninstalled work perfect, 500 images in a few seconds :).

I was a little confuse about the Enviroment, this is what I was do if it helps someone:

Uninstall Visual Studio 2017
Install Visual Studio 2015
Downgrade from cuDNN 6 to cuDNN 5.1 (With CUDA 8 installed)
Inside the Enviroment: -- pip uninstall dlib -- Download last dlib version from https://github.com/davisking/dlib/archive/master.zip -- dlib dir unzipped and python setup.py install --yes DLIB_USE_CUDA

dlib check (Inside the Enviroment):

Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dlib
>>> dlib.DLIB_USE_CUDA
True
>>>

tensorflow check (Inside the Enviroment):

>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

2018-02-12 03:42:56.921157: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2018-02-12 03:42:57.291935: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:02:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2018-02-12 03:42:57.292165: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1
2018-02-12 03:42:57.624743: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:02:00.0, compute capability: 6.1

iperov commented 6 years ago

I built with cuDNN 6 ok.

my trick how to build dlib with installed both 2015 and 2017: is edit dlib\setup.py:

. . .
cmake_args = ['-DCMAKE_GENERATOR=Visual Studio 14 2015',
. . .

ZeroCool22 commented 6 years ago

@iperov

(deepfakes) C:\Users\ZeroCool22\Desktop\Nueva carpeta (2)>python cuda.py Traceback (most recent call last): File "cuda.py", line 1, in print(dlib.DLIB_USE_CUDA) NameError: name 'dlib' is not defined

iperov commented 6 years ago

notepad _2018-02-12_10-47-33

ZeroCool22 commented 6 years ago

@iperov Now yes:

screenshot_9

ZeroCool22 commented 6 years ago

@iperov Found the FAIL while installing DLIB:

screenshot_11

How i can FIX it?

iperov commented 6 years ago

full log

ZeroCool22 commented 6 years ago

@iperov If want the cuda_test from C:\Users\ZeroCool22\Desktop\dlib ultimo chico\dlib-master\build\temp.win-amd64-3.6\Release\dlib_build\cuda_test_build i can give it yo you.

But the LOG with installing DLIB doesn't show up FULL on the Anaconda terminal...

iperov commented 6 years ago

i dont know what to say without log.

ZeroCool22 commented 6 years ago

DLIB INSTALLATION LOG: https://drive.google.com/open?id=15TKqWINR8twQH7LiPSapT89JmGBGtj9-

CUDA TEST: https://drive.google.com/open?id=1NdzfFpC5KsOZCnI2M4nRGltq-Q-kB36z

lol, sorry fixed.

ZeroCool22 commented 6 years ago

I don't know if you needed that, that's the patch that appears in the Anaconda prompt: C:\Users\ZeroCool22\Desktop\DLIB ULTIMO CHICO\dlib-master\build\temp.win-amd64-3.6\Release\dlib_build\cuda_test_build

iperov commented 6 years ago

CUDA TEST: - cmakecache ?? wrong file in google drive

Arthil commented 6 years ago

Hmm I'm also having this problem.

I reinstalled dlib using the methods above, and my dlib.DLIB_USE_CUDA returns true. My tensorflow also can detect the gpu, so that isn't the problem. Using python 3.5

So when I run the convert (-D hog), it doesn't use GPU at all. Or is the (-D cnn) only one using the gpu?

I tried running the -D cnn option but then the dlib fails with

Failed to convert image: Reason: Error while calling cudaMalloc(&data, new_size*sizeof(float)) in file: reason: out of memory

I have a gtx 1080 gpu.

Any hints on what is causing this?

iperov commented 6 years ago

@Arthil are u using latest faceswap repo with serializer?

iperov commented 6 years ago

@ZeroCool22 try latest dlib with

python setup.py install --yes DLIB_USE_CUDA -G "Visual Studio 14 2015"

Arthil commented 6 years ago

@iperov ahh looks like I forgot to pull the latest updates!

It's complaining about a alignment.json file now, how do I generate this file?

iperov commented 6 years ago

restart extract

salre9501 commented 6 years ago

Ok so I did

import dlib dlib.DLIB_USE_CUDA

and it says true. I create an alignments.json after pulling the update and extracting again. I tried converting and I get a little faster speed, but my GPU usage still zero (except for the vram). Now I'm getting 50 minutes for 5402 images so it was an improvement. I'm just not sure if these are normal speeds.

dlib version 19.9.99, tensorflow-gpu 1.5, CUDA 9, cudnn 7

iperov commented 6 years ago

@salre9501 5402 for 50 min ????? superior result. I have only ~3000 per hour. I also have low GPU usage on convert, I think its normal.

salre9501 commented 6 years ago

Ok I think I fixed it. Removing seamless -D cnn is giving me speeds of 5.5 it/s. Now the same 5400 images in 15 minutes

ZeroCool22 commented 6 years ago

@iperov


(deepfakes) C:\Users\ZeroCool22\Desktop\dlib ultimo chico\dlib-master>python setup.py install --yes DLIB_USE_CUDA -G "Visual Studio 14 2015"
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: option -G not recognized

(deepfakes) C:\Users\ZeroCool22\Desktop\dlib ultimo chico\dlib-master>

spackofatzo commented 6 years ago

I experience the same behaviour like mentioned here https://github.com/deepfakes/faceswap/issues/184#issuecomment-364935845 When "seamless" is used, there's no gpu usage and the process of conversion runs extremly slow. With "seamless" disabled it works like a charm.

Dlib selfcompiled with "--yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA" Face-Recognition with cnn works great (that didn't work well without selfcompiled dlib)

deepfakes / faceswap

Getting 50 minutes conversion time for 3000 images, 0% GPU usage but high vram usage. Is this normal? #184