deepfakes / faceswap-playground

User dedicated repo for the faceswap project
305 stars 194 forks source link

deepfakes/faceswap state-of-the-art #191

Open agilebean opened 6 years ago

agilebean commented 6 years ago

Dear deepfakes community,

maybe you agree that this is one of the best github repositories - all contributors are active and respond to bugs fast, and care about problems and questions of the community. Also, many here are very active and help each other. I realized that many give helpful hints and tips but the information is of course scattered in individual comments. So I thought it would be great to bring together all the collective intelligence as not everyone can read through all posts. Here's my request: I would like to ask all of you to share - in the most concrete form possible - your lessons learned. That means: How would you put your experience into the most straightforward advice?

Straightforward means write concretely

As the evaluation is very subjective, it would be helpful to provide concrete criteria - e.g. for evaluating the training parameters, I use processing time as criterion. This is very end user relevant because it directly translates into waiting time; for cloud users, processing time also directly translates into money.

For example: A:
Here are some examples from my experience (with school grades A, B, C):

1. Training

criterion: processing time

1.a) option -t (trainer) A+: Original A: OriginalHighRes, IAE D: GAN, GAN128

1.b) option -it (number of iterations)

Summary: When standardized to same processing time, the Original model is unbeatable. The OriginalHighRes of course shows finer granularity but takes 4 times longer than the Original model. The IAE model shows no noticeable difference to the Original model. The GAN models show disappointing results.

2. Conversion

criterion: large faces (faces cover >66% of video height)

The following option make a distinguishable difference (credits to @andenixa and @HelpSeeker) from B to A: 2.a) option -mh (--match-histogram) aligns histograms between input image and swapped face 2.b) option -sm (--smooth-mask) smooths over irregularities 2.c) option -e (--erosion-kernel-size) decreases outer edge of mask by specified number of pixels. values between 15 and 30. 2.d) option -S (--seamless) improves transition from mask to face

Summary: All the above option result in a better transition between the swapped face and the input image's body (erosion and seamless options), and a more natural look of the swapped face (histogram matching and smoothing).

Question: What are the recommendations for options -D (--detector) {dlib-hog,dlib-cnn,dlib-all,mtcnn} -M (--mask-type) {rect,facehull,facehullandrect} -aca (--avg-color-adjust) ?

This will be useful for anybody active here, every experienced user and especially the newbies. So please share concretely!

torzdf commented 6 years ago

I'm thinking of opening up the wiki page for faceswap, but I'm not really going to have time to maintain it. Would you be interested in compiling any information received there?

agilebean commented 6 years ago

Hi, although this would be an honor, I'm unfortunately not qualified enough. I'm not a programmer, just learning python because of this github repo. But I could try to do write a more extensive documentation page if you wish. That I feel comfortable with as I'm learning about each parameter. On 10. Jul 2018, 18:50 +1000, torzdf notifications@github.com, wrote:

I'm thinking of opening up the wiki page for faceswap, but I'm not really going to have time to maintain it. Would you be interested in compiling any information received there? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

torzdf commented 6 years ago

Sure, that would be useful. When you have enough information to submit, you can raise a PR (something like FAQ.md or USER_GUIDE.md). I think it would be helpful to have this in the main repo rather than tucked away in issues.

agilebean commented 6 years ago

Yes ok, I will do that. Before I submit, I would like to ask you and some other contributor to review it. My idea is to cover those options with the highest impact first. That's why I invited everybody to share, then I can summarize the info for documentation. On 10. Jul 2018, 20:35 +1000, torzdf notifications@github.com, wrote:

Sure, that would be useful. When you have enough information to submit, you can raise a PR (something like FAQ.md). I think it would be helpful to have this in the main repo rather than tucked away in issues. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

torzdf commented 6 years ago

If you raise it as a PR (against the staging branch) it will be open for review and amends prior to commiting

Kirin-kun commented 6 years ago

Question: What are the recommendations for options -D (--detector) {dlib-hog,dlib-cnn,dlib-all,mtcnn} -M (--mask-type) {rect,facehull,facehullandrect} -aca (--avg-color-adjust) ?

In my experience, until mtcnn was ported, dlib-cnn (or dlib-all) was the best. Now, I think mtcnn find more faces, but at the same time, produces a lot more false positives.

I don't mind that much, because I work on photos (a few hundred for each set) but with videos, it becomes a headache to sort through all the garbage.

I think you have to tinker a lot more with the size/thresholds with mtcnn. The default is a bit too lenient about "what is a face". And also, the faces seem to have more stable landmarks with mtcnn.

Now, dlib-cnn has an issue with memory recently (I opened a issue about it) with large images, which mtcnn doesn't seem to have. Maybe dlib can find faces that mtcnn does not, but I couldn't test properly.

All in all, I wish that we'd have the option to use all extractors, eventually in multiple passes (with the extracted faces prefixed by the extractor name) and be able to choose which one we want to keep for a particular frame/image. This bring on the table embedding the landmarks in the extracted face, because as of now, we can't mix multiple extractors other than by editing manually the alignments.json.

For the mask-type, I didn't play much with it, but I never found a case where it would be useful to change the default.

As for aca, I never tried it.

For the model to use, Original seem to be unbeatable at speed of learning and gives good results very quickly. OriginalHighRes is promising, but it takes a lot longer. For my current set, I'm at 180K iteration (with high res photos as source) and the open mouths start to match, whereas with Original, it was a lot faster. But I like the details for HighRes so I will continue training it and see how it goes. But it's not a model for the impatient.

kellurian commented 6 years ago

This could be a great resource, because there are lots of options that I think the devs assume we know about (or could be just me) because many have been in this process for so long or they can just read the code and figure it out, or its buried in the PR when it was originally merged. Some options have great descriptions, but some don't. Like the mask type: what is the difference between rect, facehull, and facehullandrect? As well as some generalized questions like: Is it better to always stay with the same batch size or should you increase/decrease as you train and the loss starts to level off. Should you go with the highest batch size your card can handle, or is there a trade off between speed of epochs and batch size? Is it better to find every face in a video and train or can you get great and faster results with training like every 2nd image from the video(that you are replacing the face), because currently when I take a video, if its long it can have 45,000 or more faces but I have read about some people using ffmpeg to drop the frame rate when decompiling the video to half and only taking half the images for faster training. I think I could drop 30 questions easy on this subject.

agilebean commented 6 years ago

Thanks @Kirin-kun and @kellurian for your valuable comments. Unfortunately, there have been no others to share their experience, I'm a bit disappointed. I can give recommendations based on my own experiments, but that was not the idea. Everybody uses a different GPU so it would have been interesting to see what best lessons learned exist for each training and conversion option depending on which GPU standardized on processing time.

What I found most useful to change in the documentation is to specify the default values for each option if not specified. Do you agree? E.g. I don't know which mask type and detector is used as default.

kellurian commented 6 years ago

Yeah, default values would help. For my part, I don't often see much difference in the types of masks when I have tried different ones, so I usually do just rect or facehullandrect. However, I would love to hear a more technical description of what the masks represent and so on.

Kirin-kun commented 6 years ago

After trying, I can tell that adjust converter and its related options, as well as mask-type, are totally useless.

The defaults (masked converter and facehullandrect) are enough.

The others just give a mess on the converted faces. There must be really specific cases where those work, but I never actually met them.

Seamless, Erosion and blur are really useful but depend on the dataset so there's no "good" values for them. Just rules of thumb you refine according to the size, shape and zoom of the faces. Histogram matching is meh... it messes up the colors more often than not.

As for the time/iterations, it's so dependent on GPU, batch size and datasets quality that I don't have any reference to give.

andenixa commented 6 years ago

@Kirin-kun @agilebean I want to note that below is no official FaceSwap view on the subject but my subjective advices based on personal experience with FS and its forks.

1) So far for converter you need face-hull masked, without seamless option. Smooth must be about 80% of erosion modifier which should be about 10-15% of the face size. No sharpening option or your video is going to have totally artificial look. Also its advisable to do detection run separately from the Conversion . Always use the slowest/most accurate version of face detector for the final composition; there is no compromise in that. 2) 35k samples is insane. Unless you are using 8 Teslas GPU its going to take weeks before your Model has a chance to train at every sample enough to show on the result. 3) Data matters: blurry samples must be thoroughly removed so is dark ones and every sample where you can't clearly recognize a face. That includes smudged blinks as well. 4) Tensor/Model resolution matters - the more the better. Its a no brainier. If you have 12+GB of video RAM try setting the model to 256x256. 5) Use your own encoding parameters for the video. Use constant quality with ffmpeg ~Q18-22 for h264 codec. Not sure about HEVC. If the resulting file is too huge re-encode it with a third-party video converter. 6) Mask type w/o face-hull isn't too bad if you doing a FaceSwap against a green screen. It changes a bigger portion of the face and has much bigger potential for the variety of faces you can use. But it probably is a bad idea to swap a face in a regular video. 7) Histogram matching is a common problem. Try finding target samples with similar exposure settings and lighting environment as your source images. Don't mix too much shots with diverse lighting conditions. If your video consists of more than 2 lighting sets you might want to potentially train two different nets with separate sets and do face swapping separately. I know FS is all about automation but if its a quality you need you would have to work for it. Probably even learn a bit of Python (luckily its the easiest language to learn). Thanks for your help and good luck! Please do share your thoughts, ideas and findings.

kellurian commented 6 years ago

@andenixa Thank you so much for your advice, I think this is golden. Of course, your answers generate more questions:

  1. I have a 1080 TI and almost always train on the highres model, using a change I found in one of the Issue discussions, what is the maximum amount of samples you would train at a time? I get fairly decent results at 1-2 days with a loss of 0.014-0.016, but certainly have noted better results with fewer samples. If it doesn't take too long, I try to get the model down to 0.012 or less
  2. How do I set the model to 256x256? I know this will involve changing the code, but I am too beginner to find the location in the model.py file myself, so can you share where I would change those parameters? Are there any other parameters I could change fairly easily? Are these all in the model.py files. Looking through the code doesn't always produce "aha!" moments for me.
  3. Many times the preview looks amazing, but when I do convert the actual picture produced looks too much like the original subject. Not really blurry, but just doesn't look enough like the face I want to change to. I have found subject selection,decreasing the blur and erosion can help in this, but additional training (or at least more time than I am willing to put into it) doesn't seem to help. Is this due to the model not being trained enough, or is it that the subject just doesn't resemble the face I need enough?
andenixa commented 6 years ago

@kellurian

  1. I think whatever works for you is good. I wouldn't use too many pics for set B and I probably wouldn't favor loss values over preview because loss is currently not perceptual (meaning it gives numeric proximity of your predictions to the actual data disregarding perceptual difference).
  2. For model its pretty self explanatory, if you open model.py it has a code:
class Model():

    ENCODER_DIM = 1024 # dense layer size        
    IMAGE_SHAPE = 128, 128 # image shape

    assert [n for n in IMAGE_SHAPE if n>=16]

IMAGE_SHAPE = 256, 256 is what you need here. That would work for the model itself as for Converter it probably needs to be re-adjusted for new output size (perhaps I am going to make this adjustment automatic in the next PR).

  1. Perhaps either your video frames a drastically different from the training data and/or you overtrain you Model to the point it only does a good job with the dataset. One of the approaches would be using all the frames from target video as set A.
kellurian commented 6 years ago

So I had originally changed the originalhiresmodel.py file with these parameters(somebody suggested this change for lower cards in the Issues discussion): ENCODER_DIM = 512

x = Dense(dense_shape dense_shape 1024)(x) x = Reshape((dense_shape, dense_shape, 1024))(x)

x = self.upscale(512, kernel_initializer=RandomNormal(0, 0.02))(inpt) x = self.upscale(256, kernel_initializer=RandomNormal(0, 0.02))(x)

As I found that I really couldn't use the highres model with my previous cards(980 TI x2) with greater than a batch size of two. I found this modification, and not only did it work, it worked better and faster than the original model. My time to passable models is significantly cut down (approx 8-24 hrs depending on the the number of pictures) and the "red screen of death" I kept getting from training completely went away(where after training for a few hours, the images all went red and the loss shot to .44). So I have stayed with this modification because the results have been good and I can do a batch size of 64 currently, though I am not really sure what it did(changed the number of nodes maybe?), though It clearly decreases the memory requirements for training. I now have a 1080 TI, and I am wondering would it make a difference to go back to the default encoder settings? What kind of difference would it make it I increased the model to 256x256 with the encoder dim to 512?

andenixa commented 6 years ago

@kellurian Never heard of "red screen of death" and I am pretty sure I trained the model for days straight with 980Ti. Basically your ENCODER_DIM is quite low though I didn't try multi_gpu which requires more RAM to run though. Nevertheless these settings have nothing to do with output resolution, i.e. 256x256.

kellurian commented 6 years ago

What's the encoder DIM actually do? How much does it affect output? I've had better results than with the original trainer. Also, my question in regard to 256x256 was is it worth the extra VRam costs during training if my encoder Dim is 512, I mean will it be worthless if the DIM is lower than 1024.

On Fri, Jul 20, 2018 at 8:28 PM, Artem Ivanov notifications@github.com wrote:

@kellurian https://github.com/kellurian Never heard of "red screen of death" and I am pretty sure I trained the model for days straight with 980Ti. Basically your ENCODER_DIM is quite low though I didn't try multi_gpu which requires more RAM to run though. Nevertheless these settings have nothing to do with output resolution, i.e. 256x256.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepfakes/faceswap-playground/issues/191#issuecomment-406756789, or mute the thread https://github.com/notifications/unsubscribe-auth/AibL7djDEoaFU2UISUkm0PLQMkPAsgrLks5uInWugaJpZM4VI0Rc .

andenixa commented 6 years ago

@kellurian ENCODER_DIM is the size of the encoder's Deep interconnected ANN. The bigger the number the more clarity you get. You must find a balance balance between the output size and ENCODER_DIM to make sure it fits to your VRAM. I don't have a 12GB GPU so I can't really test it. But no, higher resolution is never worthless.

@everyone I probably would never use multi_gpu unless you have a couple (or more) high end videocards. Speedups not worth the increased memory limitations. It doesn't matter if it takes a day or two to properly train your model unless you must deploy these by a dozen a week.

kellurian commented 6 years ago

I tried to increase the image shape, but got this error: Exception in thread Thread-1: Traceback (most recent call last): File "c:\users\sean\anaconda3\Lib\threading.py", line 916, in _bootstrap_inner self.run() File "c:\users\sean\anaconda3\Lib\threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "H:\faceswap\scripts\train.py", line 97, in process_thread raise err File "H:\faceswap\scripts\train.py", line 89, in process_thread self.run_training_cycle(model, trainer) File "H:\faceswap\scripts\train.py", line 124, in run_training_cycle trainer.train_one_step(iteration, viewer) File "H:\faceswap\plugins\Model_OriginalHighRes\Trainer.py", line 38, in train_one_step loss_A = self.model.autoencoder_A.train_on_batch(warped_A, target_A) File "C:\Users\Sean\Envs\fakes\lib\site-packages\keras\engine\training.py", line 1214, in train_on_batch class_weight=class_weight) File "C:\Users\Sean\Envs\fakes\lib\site-packages\keras\engine\training.py", line 754, in _standardize_user_data exception_prefix='input') File "C:\Users\Sean\Envs\fakes\lib\site-packages\keras\engine\training_utils.py", line 136, in standardize_input_data str(data_shape)) ValueError: Error when checking input: expected input_4 to have shape (256, 256, 3) but got array with shape (128, 128, 3)

andenixa commented 6 years ago

@kellurian Noted, thanks. Shall fix in the following PR.

gessyoo commented 6 years ago

Someone pointed me to this post: https://www.deepfakes.club/forums/topic/test-series-comparing-128x128-trainers/. The writer compared the current 128x128 models using the same data sets. He recommended the DF model from DeepFakeLabs and the OriginalHighRes. The issue with the DF model was that the conversion process often leaves white patches on the output faces. Is this a flaw in the mask or a histogram-related issue? The writer's conclusion was that the current 128x128 models had their strengths, but that more work was needed to improve the conversion process.

kellurian commented 6 years ago

Ok, so I have a different question, now about the sort tool(currently using the gui version). So if you want to use it on the images from the video that you are changing the face, you have to use folders and not rename, correct? Because if you rename them then when converting them you will run into the issue that the "aligned directory" won't have the same file names that they were when the faces were extracted. So my issues is that if I use the folders option lately it seems that it only takes about 5% of the images and puts them into folders. If I keep running it on the directory, I will get a few more images each time, but then I get the win2 error can't find certain .png files. IF I use rename, it moves about 20 files and copies them until they are the same number as in the original directory. Weird. Is there a way to use the rename on the images and the converter still know that they are the images before, or is there a way rename them back to their original names after you have sorted them, removed unwanted faces, trash, ect.?

agilebean commented 6 years ago

@kellurian @Kirin-kun @andenixa Thanks again for your very valuable comments. As of my initial motivation for this issue, I wanted to summarize the combined experience of you. So here's my first short draft on those options with multiple opinions. It is a work in progress, so your feedback is welcome. At the end, I have some questions for you where I was not 100% sure. Please answer, thanks!

Training

-t,--trainer

Summary:

Original and OriginalHighRes deliver best results.

Sources:

@Kirin-kun:

Original seem to be unbeatable at speed of learning and gives good results very quickly. OriginalHighRes is promising, but it takes a lot longer

@agilebean:

OriginalHighRes takes ~25% more time than Original, and only shows good results after at least 1.5x more iterations than Original. So Original is the most efficient considering processing time.

image_shape

Summary:

Increase IMAGE_SHAPE in model.py if GPU RAM > 12 GB. Available in next PR. OriginalHighRes: from (128, 128) to (256, 256)

Sources:

@andenixa:

Tensor/Model resolution matters - the more the better. If you have 12+GB of video RAM try setting the model to 256x256. ENCODER_DIM is the size of the encoder's Deep interconnected ANN. The bigger the number the more clarity you get.

@kellurian:

I tried to increase the image shape, but got this error: [...] ValueError: Error when checking input: expected input_4 to have shape (256, 256, 3) but got array with shape (128, 128, 3)

@andenixa:

Noted, thanks. Shall fix in the following PR.

Conversion

-mask-type

Summary:

facehullandrect (default) or facehull are favored. rect would cover bigger face area.

Sources:

@kellurian:

don't often see much difference in the types of masks...so I usually do just rect or facehullandrect.

@Kirin-kun:

The defaults (masked converter and facehullandrect) are enough.

@andenixa:

for converter you need face-hull masked, without seamless option. Smooth must be about 80% of erosion modifier which should be about 10-15% of the face size. Mask type w/o face-hull isn't too bad if you doing a FaceSwap against a green screen. It changes a bigger portion of the face and has much bigger potential for the variety of faces you can use.

-mh,--match-histogram

@andenixa:

Histogram matching is a common problem. Try finding target samples with similar exposure settings and lighting environment as your source images. Don't mix too much shots with diverse lighting conditions.

@Kirin-kun:

Histogram matching is meh... it messes up the colors more often than not.

@agilebean:

I might be the only one who had clear improvement with histogram matching even with different exposure settings, don't understand why.

-sh,--sharpen

@andenixa:

No sharpening option or your video is going to have totally artificial look.

Questions:

@andenixa: When you say (on converter options)

Smooth must be about 80% of erosion modifier which should be about 10-15% of the face size”

how do you specify smooth with 80% of erosion? convert only offers a -sm, —smooth-mask option without any argument.

@andenixa How would you increase the image shape in Original? Original defines the image shape as IMAGE_SHAPE = (64, 64, 3). So analog to changing IMAGE_SHAPE in OriginalHighRes from (128, 128) to (256, 256), would you change IMAGE_SHAPE in Original from (64, 64, 3) to (128, 128, 3)?

@Kirin-kun @andenixa @torzdf: How do you determine the optimal batch size? Processing time increases exponentially with batch size, so that would favor small batch sizes of 1-8. However, the loss values hardly decrease with these small batch sizes. But which are the best values?

andenixa commented 6 years ago

@agilebean Its a rather long summary, perhaps you should compose some kind of WIKI page.

Sharp

Just leave the option undefined

How would you increase the image shape in Original?

You can't without some major code tweaking.

How do you determine the optimal batch size?

The bigger the better. With batch smaller than 8 it takes about 3x more epochs to train the same features and with batch smaller than 16 some features would never be learned by your model.

Processing time increases exponentially with batch size

Not precisely, its rather epoch is being trained slowly. But you train on much more samples in one go, i.e, 1 epoch at -bs 8 roughly equivalent of 2 epochs -bs 4 with the exceptions that its done ~1.5-2x times faster. So batch sizes actually increase the speed of the training and its quality.

agilebean commented 6 years ago

@andenixa

Its a rather long summary, perhaps you should compose some kind of WIKI page.

Yes, this is exactly the goal. I'm preparing for some new documentation, what @torzdf encouraged me to do at the beginning of this thread. That's why I need feedback.

Your answer on batch sizes in relation to processing time is super insightful. Great!

The bigger the better. With batch smaller than 8 it takes about 3x more epochs to train the same features and with batch smaller than 16 some features would never be learned by your model

So the next step would not be 32 or 64, but as you say the bigger the better, 128 or 256 if the GPU and GPU RAM supports it?

Can you also clarify what you meant with:

Smooth must be about 80% of erosion modifier which should be about 10-15% of the face size”

how do you specify smooth with 80% of erosion? convert only offers a -sm, —smooth-mask option without any argument.

gessyoo commented 6 years ago

I defer to @andenixa since I'm a python novice, but I've made a dozen or so deepfakes testing the various models. I think the optimum batch is the largest batch size your video card can handle without getting an OOM (out of memory) error when training. The default batch size for DeepFakeLabs is 16, I think for compatibility with older hardware and 2 and 3 Gb. cards. But those models seem to have difficulty with open and closed eyes, and so I try to increase the batch as much as possible in the hopes of better results. I recently used the originalhighres model on a GTX 1070 (8 gb.) and was able to increase the batch size to 40, but no further, without getting OOM errors.

andenixa commented 6 years ago

@gessyoo that's pretty much what I said about batch sizes. Well almost. If you know just a little python I'd tweak the model a bit to better utilize your memory. Anyway -bs 40 is pretty humongous and should be enough for any sensible data-set. I never tried DeepFakeLabs so wouldn't know. Commonly if you a Net can't learn eyes/eyeballs it has too little number of parameters thus the model itself needs adjustments. I had some success with such models by defining bigger batch sizes and prolonged training but it doesn't eradicate the problem entirely. Note that I actually had to reduce the complexity of Original HighRes because of complaints of people who have older hardware (and people who expected it to run with enormous batch-sizes).

@agilebean I shall get back to you regarding the erosion as I see it causes some confusion.

gessyoo commented 6 years ago

andenixa, I'm aiming for the best possible results regardless of the training time required. You're suggesting, as discussed above, that the model could be adjusted with IMAGE_SHAPE = 256, 256, perhaps with a smaller batch size to account for 8 vs. 11 Gb. of video memory? I'm looking forward to your next PR. If you want to see your code and DeepFakesLab code in action, take a look at https://www.reddit.com/r/GifFakes/.

kellurian commented 6 years ago

I tried it , couldn’t even get 1 batch to run. 1080Ti with 11 gb of ram. Out of memory

Sent from my iPhone

On Jul 28, 2018, at 5:36 PM, gessyoo notifications@github.com wrote:

andenixa, I'm aiming for the possible results regardless of the training time required. You're suggesting, as discussed above, that the model could be adjusted with IMAGE_SHAPE = 256, 256, perhaps with a smaller batch size to account for 8 vs. 11 Gb. of video memory? I'm looking forward to your next PR. If you want to see your code and DeepFakesLab code in action, take a look at https://www.reddit.com/r/GifFakes/.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

gessyoo commented 6 years ago

Windows 10 reserves about 2 GB of video memory, so even 8 Gb. cards only have about 6.5 Gb. available to CUDA. Ubuntu doesn't have this problem. There was a rumor that the soon to be released next-gen Nvidia Turing cards would have 8 and 16 Gb variants, but the latest rumor is that 11 Gb. is the max., like the current Pascal series. Limited video memory is what's holding back the resolution of these models.

andenixa commented 6 years ago

@kellurian @gessyoo To avoid Windows hogging all the RAM you could switch to your integrated video card and do FS at your dedicated one. If that's not an option close all applications and switch to basic UI theme. It should let off most of resources. Unfortunately I don't own a 12GB GPU and any alterations from my side would be just guessing. I am pretty sure its doable even with 8GBs (yet with much reduced complexity). I might need to test it.

DeepFakesLab is very unimpressive. The mask is somewhat good though yet I got much better results with DF256 @ 1070Ti (8Gb).

gessyoo commented 6 years ago

Would you mind sharing the code for your DF256 model? I thought 256x256 resolution could not be done with 8 Gb. VRAM. Unfortunately, Windows 10 reserves the memory even for secondary video cards. It's a well-known issue for CUDA users, but Micro$oft hasn't gotten around to fixing it despite the complaints.

kellurian commented 6 years ago

Yeah I can’t get any of the training on deepfakeslab to run. Not even batch 2 of the hf model. It’s not out of memory, just sits on a black screen saying “starting” and then it doesn’t ever work. Not sure what the problem is since I am using his dependency distribution.

Sent from my iPhone

On Jul 29, 2018, at 5:30 PM, gessyoo notifications@github.com wrote:

Would you mind sharing the code for your DF256 model? I thought 256x256 resolution could not be done with 8 Gb. VRAM.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

gessyoo commented 6 years ago

kellurian, it takes a long time (10-15 minutes) for training to start initially, but the training does start for me on Win 10 x64. Maybe your virus scanner is blocking file execution or creation? The other culprit may be that the version of tensorflow (1.8) included in the distribution requires a CPU with AVX instructions (2011 Intel Sandy Bridge (e.g. 2600K) or newer, or AMD Ryzen, and tensorflow 1.8 won't start without an AVX capable CPU.

kellurian commented 6 years ago

It might be that I haven’t waited long enough. I have avx and avx2 on my cpu(core I7 5820k).is it worth the trouble? Does deepfacelab work well with his new model types?

Sent from my iPhone

On Jul 30, 2018, at 1:10 AM, gessyoo notifications@github.com wrote:

kellurian, It takes a long time (10-15 minutes) for training to start initially, but the training does start for me on Win 10 x64. Maybe your virus scanner is blocking file execution or creation? The other culprit may be that the version of tensorflow (1.8) included in the distribution requires a CPU with AVX instructions (2011 Intel Sandy Bridge (e.g. 2600K) or newer, or AMD Ryzen, and tensorflow 1.8 won't start without an AVX capable CPU.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

andenixa commented 6 years ago

Would you mind sharing the code for your DF256 model? I thought 256x256 resolution could not be done with 8 Gb. VRAM.

It can with a bit of luck. Yet it won't be easy to train and it might not work well with just any data because of a limited number of parameters. I am working on bringing the code to FS. I have a DeepFaceLab model if you want it.

I have avx and avx2 on my cpu(core I7 5820k).is it worth the trouble?

It worth the trouble if you can manage it to work.

HelpSeeker commented 6 years ago

@kellurian

It’s not out of memory, just sits on a black screen saying “starting” and then it doesn’t ever work.

iperov's Windows pre-built automatically suppresses Tensorflow messages (including thrown errors like OOM). To re-activate the messages, find setenv.bat in the internal_ directory, then change SET TF_SUPPRESS_STD=1 to SET TF_SUPPRESS_STD=0

Perhaps you'll be able to determine the cause of your problems this way.

gessyoo commented 6 years ago

andenixa, I'd like to try your DeepfakeLabs version, even it it's not easy to train or suited to all data types. Maybe I'll learn some more python along the way. I saw a DF160HD model referred to the DeepfakeLabs forum that supposedly worked with 8 Gb. cards, but the link was dead.

gessyoo commented 6 years ago

kellurian, iperov's MIAEF128 model is interesting because it attempts to match the destination histogram during training. DFL works with older video cards and has a cool feature that allows you to manually select the facial landmarks in the destination images, if they're not detected automatically. The downside is that there's not much support, because it's just one developer.

andenixa commented 6 years ago

@gessyoo judging by naming (DF160HD) I was the author of that one as well. I still have it on me though I really am not sure if that's any good. DeppFakeLab and DeepFaceLab are two different packages or so I think. Anyhow there you go: DFL DF models. There is 3 models of which DF160HD and DF256 are confirmed to be "production ready". As for DF176 I never managed to finish the training. Apart from DF160HD and LIAEF144HD all the models require 8GB of RAM. (LIAEF144HD has limited capability to match destination histogram during the training). These should be considered as experimental. I never meant to release any of those to general public.

kellurian commented 6 years ago

Yeah I have messages turned on in the setenv file. Still done see OOM messages when I start. I haven’t tried waiting the ten mins yet.

Sent from my iPhone

On Jul 30, 2018, at 10:12 AM, Artem Ivanov notifications@github.com wrote:

@gessyoo judging by naming (DF160HD) I was the author of that one as well. I still have it on me though I really am not sure if that's any good. DeppFakeLab and DeepFaceLab are two different packages or so I think. Anyhow there you go: DFL models

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

gessyoo commented 6 years ago

Thanks, andenixa. I've been creating SFW deepfakes and so I'm experimenting all the time anyway, looking for better results.

andenixa commented 6 years ago

I would try and release Original256, but it would most definitely require ~5.78G of RAM minimum. There is no way I can squeeze it to 4GB. I understand its a Win10 issue with RAM hogging but you got to figure out a way around that. Try installing Linux Mint as a second OS. It looks very much like Windows and it even runs within a Windows partition. It can also run from a USB stick as far as I remember.

gessyoo commented 6 years ago

Yes, thanks, there are some interesting deep learning projects I want to study that need Linux to run.

agilebean commented 6 years ago

Hi @andenixa sorry to bother again, but I am really curious about your answer you announced a few days ago

@agilebean I shall get back to you regarding the erosion as I see it causes some confusion.

it was about your comment:

Smooth must be about 80% of erosion modifier which should be about 10-15% of the face size” and I didn't understand how to specify the smooth because convert only offers a -sm, —smooth-mask option without any argument.

And: I would be absolutely thrilled and grateful for a Original256 version. This would probably beat any existing other framework - can't wait to try. I use the Tesla P100 with 16GB, so lot's of room in the RAM...

andenixa commented 6 years ago

@agilebean Regarding the erosion I might need to study the way its done in current FS. I am a little behind in my research and am using an older one. With P100 you could probably squeeze a proper 1024 Model. Original256 would be a cripple for 6-8GB cards. But sure I shall add 256 mode for OriginalHighRes.

agilebean commented 6 years ago

@andenixa

Sure regarding the erosion I might need to study the way its done in current FS. I am a little behind in my research and am using an older one.

Independent from the current implementation, how did *you do that? Any other tool?

With P100 you could probably squeeze a proper 1024 Model.

This would be a killer. Can you implement that? Or where in the code would I have to make changes? But for the community, it would be more helpful if you could make it available of course. This would make a real disruptive step forward *dream

andenixa commented 6 years ago

@agilebean Not by me unfortunately. It would take another type of model and approach to implement and you need a 1024 Extractor to proceed. Probably not doable with Keras framework neither.

agilebean commented 6 years ago

@andenixa Ahhhh I see. Thanks for the clarification anyway. When you said you would release an Original256 model, did you mean you would enable an IMAGE_SHAPE = 256, 256 ?

On a different note, what do you think of a tip I read somewhere else here to increase random transformation by a factor of 1.1 - 1.5? What would that bring? Does this refer to what @kellurian found in

x = self.upscale(512, kernel_initializer=RandomNormal(0, 0.02))(inpt)

?

andenixa commented 6 years ago

@agilebean Well, IMAGE_SHAPE = 256, 256 already works, but it clearly doesn't fit in memory. I shall create separate autoencoder for such Input / Output sizes that would fit in 6GB. I might ask you to train it and post some reports after the release (just text). If you'd have time of course because I am much preoccupied to fully train models at this point.

ruah1984 commented 6 years ago

@andenixa [http://static.neviril.com/models_DFL_June_2018.zip]

file you sharing is for DeepFaceLab or Faceswap

found this issue when i copy and paste to faceswap plugin folder

Exception in thread Thread-1: Traceback (most recent call last): File "C:\Users\ruah1\AppData\Local\Programs\Python\Python36\lib\threading.py", line 916, in _bootstrap_inner self.run() File "C:\Users\ruah1\AppData\Local\Programs\Python\Python36\lib\threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "C:\Users\ruah1\Downloads\faceswap-master\faceswap-master\scripts\train.py", line 97, in process_thread raise err File "C:\Users\ruah1\Downloads\faceswap-master\faceswap-master\scripts\train.py", line 86, in process_thread model = self.load_model() File "C:\Users\ruah1\Downloads\faceswap-master\faceswap-master\scripts\train.py", line 102, in load_model model = PluginLoader.get_model(self.trainer_name)(model_dir, self.args.gpus) File "C:\Users\ruah1\Downloads\faceswap-master\faceswap-master\plugins\PluginLoader.py", line 14, in get_model return PluginLoader.import("Model", "Model{0}".format(name)) File "C:\Users\ruah1\Downloads\faceswap-master\faceswap-master\plugins\PluginLoader.py", line 23, in _import module = import(name, globals(), locals(), [], 1) File "C:\Users\ruah1\Downloads\faceswap-master\faceswap-master\plugins\Model_LIAEF144HD__init__.py", line 1, in from .Model import Model File "C:\Users\ruah1\Downloads\faceswap-master\faceswap-master\plugins\Model_LIAEF144HD\Model.py", line 8, in from facelib import FaceType ModuleNotFoundError: No module named 'facelib'