deepfakes / faceswap-playground

User dedicated repo for the faceswap project
303 stars 194 forks source link

deepfakes/faceswap state-of-the-art #191

Open agilebean opened 6 years ago

agilebean commented 6 years ago

Dear deepfakes community,

maybe you agree that this is one of the best github repositories - all contributors are active and respond to bugs fast, and care about problems and questions of the community. Also, many here are very active and help each other. I realized that many give helpful hints and tips but the information is of course scattered in individual comments. So I thought it would be great to bring together all the collective intelligence as not everyone can read through all posts. Here's my request: I would like to ask all of you to share - in the most concrete form possible - your lessons learned. That means: How would you put your experience into the most straightforward advice?

Straightforward means write concretely

As the evaluation is very subjective, it would be helpful to provide concrete criteria - e.g. for evaluating the training parameters, I use processing time as criterion. This is very end user relevant because it directly translates into waiting time; for cloud users, processing time also directly translates into money.

For example: A:
Here are some examples from my experience (with school grades A, B, C):

1. Training

criterion: processing time

1.a) option -t (trainer) A+: Original A: OriginalHighRes, IAE D: GAN, GAN128

1.b) option -it (number of iterations)

Summary: When standardized to same processing time, the Original model is unbeatable. The OriginalHighRes of course shows finer granularity but takes 4 times longer than the Original model. The IAE model shows no noticeable difference to the Original model. The GAN models show disappointing results.

2. Conversion

criterion: large faces (faces cover >66% of video height)

The following option make a distinguishable difference (credits to @andenixa and @HelpSeeker) from B to A: 2.a) option -mh (--match-histogram) aligns histograms between input image and swapped face 2.b) option -sm (--smooth-mask) smooths over irregularities 2.c) option -e (--erosion-kernel-size) decreases outer edge of mask by specified number of pixels. values between 15 and 30. 2.d) option -S (--seamless) improves transition from mask to face

Summary: All the above option result in a better transition between the swapped face and the input image's body (erosion and seamless options), and a more natural look of the swapped face (histogram matching and smoothing).

Question: What are the recommendations for options -D (--detector) {dlib-hog,dlib-cnn,dlib-all,mtcnn} -M (--mask-type) {rect,facehull,facehullandrect} -aca (--avg-color-adjust) ?

This will be useful for anybody active here, every experienced user and especially the newbies. So please share concretely!

agilebean commented 6 years ago

@andenixa

Well, IMAGE_SHAPE = 256, 256 already works Really? I tried it two weeks ago but got a very similar error as @kellurian observed by:

ValueError: Error when checking input: expected input_4 to have shape (256, 256, 3) but got array with shape (128, 128, 3)

This was with a P100 and 16GB VRAM. You said you wanted to fix it in the next PR... Or maybe I confuse it with something else?

gessyoo commented 6 years ago

ruah1984, andenixa can correct me, but I think the experimental models are variations on the DF model in DeepFaceLab, so you need to use them with DFL. Faceswap doesn't have the DF model implemented.

andenixa commented 6 years ago

@ruah1984 yes the models are for DFL. @agilebean it's hard to say because PR's hasn't been merged. I am pretty sure I've corrected the problem with shape mismatch in the latest PR.

agilebean commented 6 years ago

@andenixa Just had a look into the code. I found IMAGE_SHAPE = 256, 256 in the Model.py file of OriginalHighRes, not Original. So when you meant it's working you meant OriginalHighRes right?

I was just confused because you named it Original256 so I thought it would be in the Original Model.py. On 1. Aug 2018, 01:28 +0900, Artem Ivanov notifications@github.com, wrote:

@ruah1984 yes the models are for DFL. @agilebean it's hard to say because PR's hasn't been merged. I am pretty sure I've corrected the problem with shape mismatch in the latest PR. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

andenixa commented 6 years ago

@agilebean yes I meant OriginalHighRes. I was going to create a separate model called Original256 at but I suppose that it would be confusing. Thus I shall just to integrate a proper 256x mode to OriginalHighRes.

agilebean commented 6 years ago

@andenixa Thanks for the clarification! Fyi, I just tested OriginalHighRes with IMAGE_SHAPE = 256, 256 but I got the error I expected:

ValueError: Error when checking input: expected input_4 to have shape (256, 256, 3) but got array with shape (128, 128, 3)

So it will definitely be valuable if you release the 256x256 mode. Another big advantage of OriginalHighRes is that it remembers the last number of iteration in the json file - extremely helpful when you work on many experiments like I do! Would it be a lot of effort to integrate that into Original mode? I would you two coffees at buymeacoffe!!!

andenixa commented 6 years ago

@agilebean

I just tested OriginalHighRes with IMAGE_SHAPE = 256, 256 but I got the error I expected:

I am just getting good old OUT OF MEMORY (because this model won't fit to 8GB at current stage). I guess the problem here is that my PRs aren't merged yet.

Would it be a lot of effort to integrate that into Original mode?

Not really, then again I can't tell when @torzdf is going to be available to merge the changes. I think he is having some difficulties at the moment.

agilebean commented 6 years ago

@torzdf @andenixa @Kirin-kun I have now finished another series of experiments comparing Original with OriginalHighRes. My result is that if you compare on the same processing time, Original produces results as least as good as OriginalHighRes, and most of the times even better. With batch size=100, Original beats OriginalHighRes when compared on the same total processing time. That means 10.000-15.000 iterations for Original would be 3000-5000 iterations for OriginalHighRes (takes about 3.3-3-5 times more time than Original on Tesla P100).

Have you done such comparison and encountered the same finding? If yes, my question to all of you with way superior technical knowledge: Why is this the case? Original seems to have a more efficient way of processing images in training or conversion than OriginalHighRes.

Anyway, my suggestion is: Can the Original model be tweaked for higher resolutions, instead of the OriginalHighRes? @andenixa forgive me if this suggestion doesn't make sense technically.

I am just always surprised that after benchmarking all available model, the most basic one (Original) still remains quite impressive.

andenixa commented 6 years ago

@agilebean Thank you for providing test your tests overview. With OriginalHighRes you get a 4x times bigger tensor size. Unsurprisingly training takes much longer. Technically OriginalHighRes is Original tweaked for higher resolution. The two can be made a single model that works for any input/output sizes up to 256x. Yet the team is against changing core Models thus I had to put a separate one instead. Also OriginalHighRes is slightly lighter version of Original because people couldn't fit it in memory otherwise. Having -bs 100 is an overkill in my opinion. I found that -bs 48 is perhaps maximum value that has any effect at the outcome.

kellurian commented 6 years ago

Try reducing the encoder dim to 512 instead of 1024 with the originalhighres model. This seems to significantly shorten the processing time for the high res model and I don’t notice a drop in quality. It seems to get better results than the original model and is at least as fast and may be faster, at least in my experiments with different models- especially when I use a small amount of “a” images.

Sent from my iPhone

On Aug 2, 2018, at 7:06 AM, Artem Ivanov notifications@github.com wrote:

@agilebean With OriginalHighRes you get a 4x times bigger tensor size. Unsurprisingly training takes much longer. Technically HighRes is original tweaked for higher resolution. But if you ask if the two can be joined? Yes they could. There could be one Original model that works for any resolution up to 256.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

agilebean commented 6 years ago

@kellurian Thanks for the tip! Tried it with the following result: 1440s / 1000 iterations with encoder dim = 1024 1410s / 1000 iterations with encoder dim = 512 So encoder dim = 512 saves 2.1% processing time, not so much. Your much better result maybe due to a slower GPU or less VRAM: I'm using the Tesla P100 with 16 GB which is 4x faster than the Tesla K80. So I guess your tweak optimizes the usage for smaller GPUs.

agilebean commented 6 years ago

@andenixa Your comment is very insightful. To prevent misunderstandings, I don't want to discredit the work for OriginalHighRes. But now, isn't a good time to lean back and ask from the end user's point of view, which model gives us the most butt for the buck? In other words, standardized to total processing time, which models is the most effective? If you say that Original and OriginalHighRes could be merged, that would be a great major step into the future. The biggest advantage I see is that any future improvements can be done in one model, whereas two model versions tend to diverge. What you suggest with

The two can be made a single model that works for any input/output sizes up to 256x

suggests to offer to set the image shape as option, with default set to 128x and possible to set to 256x - would be wonderful indeed. I say this as I envision a future in which we can do 512x and even 1024x. All the more a reason to merge Original and OriginalHighRes, isn't it?

@andenixa @torzdf @Kirin-kun @kellurian : Do these thoughts make sense? You are have all better technical knowledge - so I'm curious about your opinion!

kellurian commented 6 years ago

From My point of view I was talking more about image quality rather than loss numbers. You do have a completely different rig than me so that might account for the difference, but I am using a 1080TI which isn’t a small card-though I don’t know how it compares to a T80 or T100.

Sent from my iPhone

On Aug 2, 2018, at 10:45 AM, Agile Bean notifications@github.com wrote:

@andenixa Your comment is very insightful. To prevent misunderstandings, I don't want to discredit the work for OriginalHighRes. But now, isn't a good time to lean back and ask from the end user's point of view, which model gives us the most butt for the buck? In other words, standardized to total processing time, which models is the most effective? If you say that Original and OriginalHighRes could be merged, that would be a great major step into the future. The biggest advantage I see is that any future improvements can be done in one model, whereas two model versions tend to diverge. What you suggest with

The two can be made a single model that works for any input/output sizes up to 256x

suggests to offer to set the image shape as option, with default set to 128x and possible to set to 256x - would be wonderful indeed. I say this as I envision a future in which we can do 512x and even 1024x. All the more a reason to merge Original and OriginalHighRes, isn't it?

@andenixa @torzdf @Kirin-kun @kellurian : Do these thoughts make sense? You are have all better technical knowledge - so I'm curious about your opinion!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

agilebean commented 6 years ago

@kellurian Yes, I agree with you that image quality should count more than loss numbers, that's also my criterion. As the Tesla P100, according to this GPU comparison, it should provide about 13 times more nominal performance measured in teraflops than the 1080Ti. However, you have to discount these ratios to find the actual performance. The P100 is about 7.8 times faster than the K80 but in my practical tests, is only 4.0 times faster. So I would say the P100 should be only about 8 times faster than the 1080Ti.

andenixa commented 6 years ago

@kellurian By reducing encoder dim you reduce nets potential to learn stuff. Don't do it if you GPU can afford the current values. In fact you might want to raise your encoder dim by a factor of two or more.

@agilebean If a Model gives unsatisfactory results its a good thing to report it . Though image comparison and training Epochs number could help. I have 1070Ti which doesn't give me much space to test models in terms of time needed to do so. In you case if you put encoder_dim a much higher value for 128x models the results would be fascinating. Although you might want a different Conv block layout. Yes you can have a model with adjustable resolution, not just 64, 128, 256 but anything in between.

andenixa commented 6 years ago

@agilebean

To prevent misunderstandings, I don't want to discredit the work for OriginalHighRes.

Perhaps it has to be if it has suboptimal performance. You have the most apt hardware to test it. But make sure to utilize a good data-set and train it for a sufficient period of time.

gessyoo commented 6 years ago

andenixa, I'm training with the DF256 model under Win 10 x64 with an 8 Gb. GTX 1070. With about 1,000 images in each set, and a batch size of 4. Batch size 5 causes memory errors. At 137,000 epochs now, preview looks good, B loss between 0. 017. and 0.015. Will be trying to run under linux to see if higher batch size is possible.

andenixa commented 6 years ago

@gessyoo thanks for trying it out. Actually batch-size of 8-9 should work pretty well. I have the same GPU and it does work nicely. You probably need more than 1k quality images because the Model is very limited so it would take lots of quality input to get meaningful result.

kellurian commented 6 years ago

@andenixa Regarding encoder_dim would you mind explaining this further? Is is similiar or the same as the nodes setting in the previous fakeapp? Like is it the "number of neurons" kind of thing? Do we have any tests of this in practice? I do wonder if sometimes the coverted image doesn't really look all that different or enough like the destination face is blamed for this, but it takes a lot longer to train.

andenixa commented 6 years ago

@kellurian I assume encoder_dim is similar to "number of neurons", but FakeApp is a closed source project so I don't know for sure. The more the better to a point when its not feasible i.e., for that particular data-set. It increases both number of neurons at Dense layer so is number of synapses thus training time is expected to go up proportionally. Please note that by altering encoder dim you make all your former saved training incompatible with your Model.

kellurian commented 6 years ago

I think it is the same as the "nodes" setting then in fakeapp, not that I use that software anymore-it was built similar to faceswap and I actually think from it. I will try to experiment with this and discuss the results. Is the setting Encoder_dim=1024 the only think that needs to change to change the number of nodes? Cause in my example I was changing some other parameters, mostly because that was what I found someone had listed

andenixa commented 6 years ago

@kellurian the bigger the setting more it gets confused by the face background thus agreeably a proper mask should be implement to raise the quality further. I plan on doing it when I have time.

kellurian commented 6 years ago

Thanks for your comments. I consider myself computer savvy, but no where near you guys who are writing code for this stuff, and appreciate you taking time to answer our questions regarding this stuff.

andenixa commented 6 years ago

@agilebean I think you wanted current epoch persistence for Original model if I got you right. Please check this PR out.

https://github.com/deepfakes/faceswap/pull/468

agilebean commented 6 years ago

@andenixa wow you are fast! give me some days, as i deleted my cloud vm a few days ago as i am rewriting the installation scripts. will try to get back to you asap. do you mean the epochs, not the iterations are saved? in that case, does this also happen in the OriginalHighRes model? epochs would be actually much more helpful than iterations, as that would enable several runs with different batch sizes etc. but the output would be normalized to epochs. On 5 Aug 2018, 4:37 PM +0900, Artem Ivanov notifications@github.com, wrote:

@agilebean I think you wanted current epoch persistence for Original model if I got you right. Please check this PR out. deepfakes/faceswap#468 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

andenixa commented 6 years ago

@agilebean yes it only saves last epoch number. I am not sure about iterations. I plan on saving other things in future such as encoder dimension and some metadata that would help to avoid difficulties with the same model / different settings scenario.

Jack29913 commented 6 years ago

@andenixa is it possible that you release original256 (HF256) for DFL?

andenixa commented 6 years ago

@Apollo122 I have it but it has not been proven to give better results than, say, h128. Honestly I didn't have a chance to train it properly. I can share it but I certainly don't guarantee it won't be disappointing.

andenixa commented 6 years ago

@Apollo122 there you go. Don't tell me I didn't warn you. Last but not least Original and H64 models are different in nature and they give different results.

Jack29913 commented 6 years ago

@andenixa thx. ill give it a try see what happens :)

gessyoo commented 6 years ago

Apollo122, I tried DF256 on Win 10 x64 + 1070 and could only manage a batch size of 4, due to Windows 10 VRAM reservation. I had only about 1,000 src and dest images. Less than ideal testing on two counts. If you're curious, I can post a screen cap of the results.

Jack29913 commented 6 years ago

@gessyoo yeah i tried with 1080 and only batch-size 4 worked also for H256 model. Sure post please.

gessyoo commented 6 years ago

Here you go. https://ibb.co/e6c36K https://ibb.co/c1OMtz https://ibb.co/dUaqmK https://ibb.co/jYYzfe. Supposed to be Trump --> Leslie Nielsen. 300,000 epochs. Haven't figured out how to avoid/get rid of the dark or white facial artifacts in DFL models, but most of the movie clips I use have fast head movement, which throws off the facial landmarks.

andenixa commented 6 years ago

@Apollo122 @gessyoo guys I would suggest you move away from Windows 10. I have 1070Ti 8GB GPU and I am able to do -batch-size about 9-10. I appreciate you sharing the results. Anyway I warned you about the possibility of suboptimal quality. I shall try and deploy OriginalHighRes for FS with 256x support that would presumably address some of the issues and alleviate RAM requirements.

agilebean commented 6 years ago

@andenixa

deepfakes/faceswap#468

Thanks for your work! I tested it. Technically it works, but I'm almost sure it is not the epochs but the iterations what is saved. Remember the discussion why the option --epochs was renamed to --iterations? Maybe in the code, the iterations are still references as epochs. But I tested with various batch sizes, went down to batch-size = 1 and 2, and the number of epochs saved was still the same as the same number of iterations specified by --iterations. Would you agree, and if yes, rename the variable to iterations?

andenixa commented 6 years ago

@agilebean Keras framework clearly indicates each training pass as an epoch and I am keeping track of these. One epoch encompasses one learning step given the number of samples, i.e. regardless of -bs 8 or 80 one straining step is called an epoch. Although number of samples processed per epoch would be different in these cases. I don't know what they call iterations though, perhaps @torzdf could clarify that.

agilebean commented 6 years ago

@andenixa Thanks for the clarification. Interesting - so the Keras framework's definition of epoch is actually not consistent with the standard definition of an epoch, i.e. one forward and backward pass through all the training samples. This is what most machine learning sources suggest (source1, source2, source3). Then, for practical reasons, is the number you save the same as the number specified by the -it, --iterations option? If yes, then I would suggest to name it also iterations. If it's different from the number of iterations, then epochs should be probably correct.

andenixa commented 6 years ago

@agilebean no its actually what you said: one pass through all the samples in a batch. I save whatever Original model outputs as epochs counter. It increases by one every learning step. The way I understand it -it is basically how many steps to do before shutting down the application which is probably helpful if you rent a GPU farm and you pay hourly.

andenixa commented 6 years ago

@agilebean I've checked your links and now I get your confusion. Well, basically term iterations and epochs are mutually interchangeable in FS and probably all of it forks. An epoch indeed means a single step though a batch. All the samples during the training are taken at random in a way it would have been difficult to calculate actual amount of passes through the data. Just imagine some people claim to have more than 30k training samples just in setA.

agilebean commented 6 years ago

@andenixa yes now we are in the same page. Your definition is clear now

epoch indeed means a single step though a single batch.

except that I would call it something different from epoch to avoid general confusion, e.g. step or the aforementioned iteration.

On 8. Aug 2018, 04:14 +0900, Artem Ivanov notifications@github.com, wrote:

@agilebean I've checked your links and now I get your confusion. Well, basically term iteration and epochs are mutually interchangeable in FS and probably all of it forks. An epoch indeed means a single step though a single batch. All the samples during the training are taken at random in a way it would have been difficult to calculate actual amount of passes through the whole set. Just imagine some people claim to have more than 30k training samples just in setA. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

gessyoo commented 6 years ago

andenixa, I ran the DF256 scripts under Ubuntu 18.04 but still couldn't increase the batch size. I'm running the monitors on the GTX 1070, could that be the issue? Are you using your 1070 as a secondary card and running the monitor with CPU-integrated graphics or another discrete video card? Maybe the solution is to run the monitors using a cheap secondary Nvidia card?

The new Nvidia cards are out, with the 2080ti at an eye-popping $1299. That same money could buy a pair of 1080ti cards. It looks like 2018 deep-learning for non-professionals and hobbyists is still going to be limited by 8 Gb. VRAM cards.

andenixa commented 6 years ago

@gessyoo yes I leave video-cards solely for training and use gt710 or integrated the monitor. (I use gt710 at Xeon powered workstation because those don't have CPU integrated cards). Its inconvenient but I can't see the other way to train huge models.

gessyoo commented 6 years ago

andenixa, I saw that you contributed to the DeepFaceLab scripts recently, correcting the nose landmarks code. I used the H128 model recently, and the results were excellent, except one scene where the background reddish sky color ended up in the destination face. Here are two sample frames: https://ibb.co/hPGyyU https://ibb.co/emkK59. Is this happening because the face landmarks are off slightly, or is it that during training, the model is more strongly pulled by the bright background sky?

andenixa commented 6 years ago

@gessyoo My correction addressed the way a a silhouette around landmarks is drawn on a debug / preview output. It doesn't alter the data in any way so it could not have been the problem.

For your particular case I generally would try to increase erosion kernel and smoothing slightly to be safe as face detection isn't exactly flawless.

andenixa commented 6 years ago

@all Is OriginalHighRes256 still a thing? I can provide 256x compatibility even at relatively low-mem configurations without sacrificing much of the complexity.

kellurian commented 6 years ago

I haven’t been able to get 256 to work. Keep getting OOM errors even with a batch of 1 and I have a 1080ti, 11 GB of vram.hever I got 192 to work but batch size is limited to 16 and it is much slower than original. Still using encoder dim of 512, compared results and I don’t get any appreciable difference with 1024 and 512 is appreciably faster on my setup.

Sent from my iPhone

On Aug 26, 2018, at 12:27 PM, Artem Ivanov notifications@github.com> wrote:

@ALL Is OriginalHighRes256 still a thing? I can provide 256x compatibility even at relatively low-mem configurations without sacrificing much of the complexity.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

gessyoo commented 6 years ago

I was able to get DF256 to work on an 8 Gb GTX 1070 with a batch size of 4, and posted some sample images. I'm willing to test any experimental model, including OriginalHighRes256, and I will post images of the results. My python and deep-learning knowledge is limited. I was really hoping that the new Nvidia RTX cards would have 16 Gb. of VRAM, but alas, no joy. In the immediate future, I think the talented programmers here should think about models that work within the 8-11 Gb. VRAM limit that the majority of users are faced with. Is there a better way to avoid OOM errors and still be able to use a 256 resolution model? Can the models dynamically adjust to the amount of available VRAM during training, or somehow swap into RAM or to disk?

andenixa commented 6 years ago

@kellurian I am talking the kind of Model that doesn't OOM. @gessyoo I would appreciate your testing a lot when I am done with it. There is some mechanics of swapping to RAM but it isn't there yet at least not for FS which is built around Keras framework.

andenixa commented 6 years ago

@kellurian DF256 was highly experimental and DFL DF ain't really a DFaker DF. Sinse DF256 I've made a progress in my understanding of autoencoders which enables me to build much bigger and faster models without necessarily sacrificing the complexity.

There is a difference between 512 and 1024 or even 2048 dense dimensions but it would be hard to make out without a proper data-set that is:

kellurian commented 6 years ago

Artem, I don't use DF, you probably meant gessyoo. But yeah I would be interested in trying out your new models, I was just speaking about my experience with the current model by upping the size in the plugin. I am always interested in improved models...

On Sun, Aug 26, 2018 at 5:35 PM Artem Ivanov notifications@github.com wrote:

@kellurian https://github.com/kellurian DF256 was highly experimental and DFL DF ain't really a DFaker DF. Sinse DF256 I've made a progress in my understanding of autoencoders which enables me to build much bigger and faster models without necessarily sacrificing the complexity.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepfakes/faceswap-playground/issues/191#issuecomment-416074068, or mute the thread https://github.com/notifications/unsubscribe-auth/AibL7XuZsHaWHpxu6e_03LXzR_fx_cWsks5uUxSbgaJpZM4VI0Rc .