csxmli2016 / DFDNet

Blind Face Restoration via Deep Multi-scale Component Dictionaries (ECCV 2020)
914 stars 216 forks source link

Way to Use GPU for Steps 1-2 and CPU for 3-4 (Link to Google Drive of .pys is included below) #53

Open zarc70 opened 3 years ago

zarc70 commented 3 years ago

First, I love this program. It is excellent at what it does and works better than any other I have seen. Due to the issues with running DFDnet on a GPU with only 6 gigs (or perhaps something else) I and others get the same OOM error when running step 3 on GPU. GPU is not actually much faster for steps 3-4 but on steps 1-2 it is orders of magnitude faster and total usage time can be reduced dramatically by utilizing GPU and CPU for each pair of steps respectively. With no change to the original code other than duplicating the original test_FaceDict.py and Deleting steps 3-4 in the first copy and deleting the actual processing parts of steps 1-2 (though not their variables, which are necessary for the final steps) in the second copy, I have sped up the process significantly over only being able to use a CPU due to the OOM error of step 3.

Please csxmli2016, I would really appreciate it if you could add my .pys to your program or find a way to get rid of the OOM error to reduce the frustration of users of this incredible algorithm (since my 100s of attempts to use no_grad in different places and clear caches didn't help a bit). Either way, here are links to my google drive folder the two .pys and a .bat that will run both of them in order:

https://drive.google.com/drive/folders/1AxcFs_qJeLXcS3IARqOMRl6uAgiMub7T?usp=sharing

Steps 1-2:

and Here's what is necessary to run them in anaconda prompt with a suitable environment for DFDNET already created: For running both PYs in order with the .bat, just enter dfdnet.bat (without putting Python before it)

For running each .py in order without the batch file
python test_FaceDict1-2GPUorCPU.py --test_path ./TestData/TestWhole --results_dir ./Results/TestWholeResults --upscale_factor 4 --gpu_ids 0 and python test_FaceDict3-4CPUOnly.py --test_path ./TestData/TestWhole --results_dir ./Results/TestWholeResults --upscale_factor 4 --gpu_ids -1

Thank you for this incredible bit of code which combined with DeOldify has let me restore hundreds of family photos! Zac

kelkun01 commented 3 years ago

Thank you for the files.

I tested your method on how fast can this improve the whole process. tho I only used 5 images, the results is pretty much the same. Im not sure if it was really running on gpu as when I tested your method my gpu was only running on 2-3%

zarc70 commented 3 years ago

I tried on a few different conda environments and they tended to want to install pytorch as cpu only. The conda environment where it did work had pytorch 1.6.0 cuda 10.2 cudnn7, numpy 1.19.2, and Pillow 8.0.1. If that doesn't work, let me know and I will troubleshoot more.

kelkun01 commented 3 years ago

@zarc70 so i installed cuda tool kit, numpy 1.19.2, and Pillow 8.0.1. even if I set the gpu to 0 for some reason I feel like it still run on cpu. Im not sure but maybe because dlib is running on cpu and not the gpu. would appreciate any tips

zarc70 commented 3 years ago

Question 1: what system are you using Question 2: are you using conda Question 3: Is it running the GPU for step 2, but not step 1?

kelkun01 commented 3 years ago

Windows 10 running on amd 3600x and 1060 6gb. yes Im using Conda. it runs on cpu for both. when I test your "test_FaceDict1-2GPUorCPU" for both 0 and -1 it runs just fine but it doesnt really use my gpu as I checked task manager it only runs on cpu. I even timed it for both (cpu & gpu) and they finish at the same time. on 5 pictures took almost 3 minutes to finish the cropping.

zarc70 commented 3 years ago

Ok, I'm not getting any GPU with Step 1, but It did activate on step 2 for me. Interestingly, for the first time I did not get an OOM this time, but it may have just been smaller pictures. Give me 48 hours before my next post and I'll see if I can work something out for step 1 - but I can't guarantee it. To see it kick in on step 2, try it with more or bigger pictures. Also checking if the simpler facecropper program can do the work of half of step 1, but not sure.

kelkun01 commented 3 years ago

alright. Im gonna poke around and see if I can fix this as well

FlowDownTheRiver commented 3 years ago

@zarc70 I have no problem running with gpu,but I wanna contribute to what you made running it with anaconda.Change the USERNAME to what you username is and save the bat.or make necessary changes to the paths if yours are different.

@echo on call C:\Users\USERNAME\anaconda3\Scripts\activate.bat call conda activate DFDNet python test_FaceDict1-2GPUorCPU.py --test_path ./TestData/TestWhole --results_dir ./Results/TestWholeResults --upscale_factor 4 --gpu_ids 0 python test_FaceDict3-4CPUOnly.py --test_path ./TestData/TestWhole --results_dir ./Results/TestWholeResults --upscale_factor 4 --gpu_ids -1 pause


In my tests it is slower than the full gpu process but for those who are getting oom,this may help at least to run it without problems.OFC if the image sizes are too large you will get oom no matter what.

zarc70 commented 3 years ago

@zarc70 I have no problem running with gpu,but I wanna contribute to what you made running it with anaconda.Change the USERNAME to what you username is and save the bat.or make necessary changes to the paths if yours are different.

@echo on call C:\Users\USERNAME\anaconda3\Scripts\activate.bat call conda activate DFDNet python test_FaceDict1-2GPUorCPU.py --test_path ./TestData/TestWhole --results_dir ./Results/TestWholeResults --upscale_factor 4 --gpu_ids 0 python test_FaceDict3-4CPUOnly.py --test_path ./TestData/TestWhole --results_dir ./Results/TestWholeResults --upscale_factor 4 --gpu_ids -1 pause

In my tests it is slower than the full gpu process but for those who are getting oom,this may help at least to run it without problems.OFC if the image sizes are too large you will get oom no matter what.

Thank you Sooo much FlowDownTheRiver

Running it as a .bat like that made the GPU Work for step 1. Interestingly, I was also able to run a .bat without activating the conda environment - not sure why that worked.

daobahan commented 3 years ago

This is really interesting, I wonder how long does it take u guys to run the first step, for me, it's taking up to 15mins per image, just for the first step, this is really slow. I've been using Deepfacelab for a while which can do face align/cropping 2-5 images per second, depends on the hardware, not sure if it's the same process.