deepfakes / faceswap-playground

User dedicated repo for the faceswap project
306 stars 194 forks source link

Deepfakes on AMD graphics/Ubuntu #203

Closed jslurm closed 5 years ago

jslurm commented 6 years ago

Several months ago, following the AMD tutorial on deepfakes.club and lots of trial and error, I managed to get Deepfakes working on my PC using an RX480 and Ubuntu (16.04). Since then, some things got buggy and unstable, so I decided it would be best to reinstall. I got a fresh install of Ubuntu 18.04 and followed more or less the same instructions as last time (the tutorial, which hasn't been updated since then, had a modified requirements.txt file), and didn't encounter any errors through the installation and compiling of Tensorflow & the various dependencies. But I can't quite seem to get it working.

When I try to extract, I get the following error, after the line "Starting, this may take a while..."

terminate called after throwing an instance of 'cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)7, cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)6, cl::sycl::exception> >' Aborted (core dumped)

And when I try to train (using datasets backed up from the previous installation):

Loading Model from Model_Original plugin... Exception in thread Thread-1: Traceback (most recent call last): File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/local/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/home/username/faceswap/scripts/train.py", line 97, in process_thread raise err File "/home/username/faceswap/scripts/train.py", line 86, in process_thread model = self.load_model() File "/home/username/faceswap/scripts/train.py", line 102, in load_model model = PluginLoader.get_model(self.trainer_name)(model_dir, self.args.gpus) File "/home/username/faceswap/plugins/Model_Original/AutoEncoder.py", line 17, in __init__ self.encoder = self.Encoder() File "/home/username/faceswap/plugins/Model_Original/Model.py", line 61, in Encoder x = self.upscale(512)(x) File "/home/username/faceswap/plugins/Model_Original/Model.py", line 47, in block x = PixelShuffler()(x) File "/home/username/faceswap/lib/PixelShuffler.py", line 13, in __init__ self.data_format = conv_utils.normalize_data_format(data_format) AttributeError: module 'keras.utils.conv_utils' has no attribute 'normalize_data_format'

Hoping someone out there more knowledgeable than me might have an idea to get this working? Any advice is greatly appreciated.

gessyoo commented 6 years ago

I have this code running under Ubuntu 18.04 with a Nvidia card I suggest that you try re-running sudo pip3 install -r requirements.txt, since the dependency requirements were updated 10 days ago. If you still get the error, try sudo pip3 install keras --upgrade since the error message is pointing to an issue with keras.

jslurm commented 6 years ago

Thanks for the tips. I tried both of those suggestions but I'm still getting the same errors for both extraction & training. I should clarify, I'm going off of this tutorial. I got it working about six months ago, but even then it required a lot of trial and error (mostly in the configuring/compiling Tensorflow part).

It's probably slightly out of date at this point but since it's for AMD cards it seems like there are additional/different dependencies in the requirements.txt file. My first try I just added the ones listed in the tutorial to the current requirements file (deleting any redundancies from the old list), and now I've just tried the unedited requirements.txt, but it doesn't seem to be making a difference.

torzdf commented 6 years ago

Extract looks like an openCL issue, which is out of scope. It might be vram related (as vram calculations are performed based on NVIDIA cards), but equally it may not. Might be worth trying a different extractor (DLIB or MTCNN). Also, Extract now has a pre-requisite for nvidia-ml-py3, which will almost definitely lead to issues with AMD cards.

For training, it looks like you're running an incorrect version of Keras. Faceswap uses Keras==2.2.0

It may be that you have to roll back to https://github.com/deepfakes/faceswap/tree/d60e9bba074fbf485183501226ff1783b3a0fc8c

jslurm commented 6 years ago

Thanks! I had a feeling the code might have changed to the point it would cause more problems with AMD cards. An earlier update around the time the GUI was added also caused issues and rolling back helped. I switched to Keras==2.2.0 and rolled back to the branch you linked. On first attempt, running anything with faceswap.py (even just -h) returned this error:

Traceback (most recent call last): File "faceswap.py", line 13, in <module> from scripts.gui import TKGui File "/home/username/faceswap/scripts/gui.py", line 17, in <module> import matplotlib File "/home/username/faceswap/faceswap_env/lib/python3.6/site-packages/matplotlib/__init__.py", line 127, in <module> from . import cbook File "/home/username/faceswap/faceswap_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 13, in <module> import bz2 File "/usr/local/lib/python3.6/bz2.py", line 23, in <module> from _bz2 import BZ2Compressor, BZ2Decompressor ModuleNotFoundError: No module named '_bz2'

When I had earlier problems with the GUI addition, I believe at the time I commented out the lines in faceswap.py related to the GUI and then it worked after that. If I try that here, extraction will run but it doesn't detect any faces. If I try training, it returns the same 'cl::sycl::detail::...' error I first encountered when extracting.

torzdf commented 6 years ago

This is a matplotlib issue. Most likely you don't have bz2 installed on your machine (assuming ubuntu): sudo apt install bz2 libbz2-dev Alternative solution discussed here: https://github.com/matplotlib/matplotlib/issues/10866

torzdf commented 6 years ago

Also, you should only need to go back to this commit: https://github.com/deepfakes/faceswap/tree/f90cd92ec36a0860c58a75631f762069419149de

You could just checkout this commit for extract, and still run training on the latest release (assuming you get it working at all)

jslurm commented 6 years ago

Just tried that commit. Seems like it fixed the matplotlib issue, but now on both extract and train I get the same 'cl::sycl::detail::exception_implementation...' error.

torzdf commented 6 years ago

Unfortunately without an AMD GPU, that's about as far as my support can go.

I will say that nothing has changed significantly in Train for quite some time though, so your issue may be elsewhere (if it used to work).

jslurm commented 6 years ago

From what I could manage through some googling it seems like it might be an issue with ComputeCPP and certain AMD GPU compatibility. I'm sure ComputeCPP has been updated since my previous install as well, but I'm not knowledgeable enough to really troubleshoot it.

Might have to just wait and see if Nvidia GPUs keep getting cheaper...

kilroythethird commented 6 years ago

I just started toying around with this (awesome) project (and NN at all), but https://github.com/plaidml/plaidml looks extremely interesting for AMD users (and prob others too).

The ComputeCPP tensorflow version was a pain in the ass to set up on my system (requires SPIR which isn't supported in the current AMDGPU-PRO openCL ICD anymore, its proprietary, the tensorflow compile process took the whole day, the first batch (setup work mainly) took about 10 minutes, ....). Also with Plaidml i get about 2-3 times the performance of the patched tensorflow version.

Not sure if this project uses TF anywhere directly, but at least training with the Original model works without a problem with Keras using plaidml. I am going to raise an issue about dropping the TF requirement in the coming days i think

kilroythethird commented 6 years ago

Just a short heads up after some testing: Extracting, training and converting works flawless with tensorflow CPU version combined with Plaidml-keras. Set KERAS_BACKEND environment variable to plaidml.keras.backend for keras to automatically pick up the plaidml backend. Extracting is a bit slow tho (with about 1-2 seconds per frame on my system). ~ 45 EG/s for training on my RX580 with the AMDGPU-pro legacy ICD.

torzdf commented 5 years ago

Faceswap now has AMD support so closing this off.