Original Model 128 worth it?

andenixa commented 6 years ago

I have created a rough version of Original Model with dimensions 128, 128, 3.

Rationale:

There seems to be increasing demand for HD face-swapping while none had any luck with GAN128 as far as I can tell from issue and the playground. In addition it could also cover more face area.

Is releasing Original128 worth it? Still assessing the efficiency. I had to sacrifice some color data to keep up with memory / speed limitations but overally its not very visible (as opposed to GAN128 that discards original color data). Speed seems to be up to snuff. Trying one-to-many scenarios as well. No LowMem version probably will ever be created for that.

Could also do Orig256 and Orig512 but definably won't fit in consumer GPU RAM. --cheers

Kirin-kun commented 6 years ago

Post results eventually.

Something like a short clip before and after. So we can judge if it's worth it?

I think the main problem isn't the resolution, but the averaging.

andenixa commented 6 years ago

Sure I will. I don't fully understand the theory though a highres data-set may contain more details for reconstruction (and more face coverage as well). I also work on idea if a concatenated Conv2d tensors with different kernels could preserve facial features such as wrinkles, freckles, and moles.

ruah1984 commented 6 years ago

You can share out, and we run again to try.

torzdf commented 6 years ago

Yeah, I think it's worth it. If you can add a new model, then choice is good.

If it's in a state you can share, then please raise a Pull Request so others can test.

Thanks 👍

gessyoo commented 6 years ago

I'm willing to test it. Is the Dfaker plugin/model still planned for integration at some point? I can run the Original model with no issues, but can't get the proposed Dfaker code here to work.

ruah1984 commented 6 years ago

@andenixa we can test it, i believe 1080TI can support your request.

andenixa commented 6 years ago

@torzdf yes its in in a state which I can share it. The major issue is to see if it has any meaningful results and since 128 models train longer I am still looking if it can perform at the level of Original quality-wise. At this stage it learns rather well but decoder part significantly lags behind the encoder and I can't predict its limitations. Funny part I can run it with ENCODER_DIM of ~3k and batch size ~42 (there is no -bs size limitation such as even numbers only or has to be power of 2) and it still fits 9080ti memory. PS: I shamelessly picked how GAN128 is implemented, but my model doesn't share its architecture. I only use some GAN128 tricks to conserve GPU RAM.

torzdf commented 6 years ago

Excellent. Well, whenever your ready, please raise a PR pointing at the Staging branch. Thanks!

andenixa commented 6 years ago

@torzdf I might be making a PR to Staging as you've suggested. Perhaps someone could get better result either by getting better data-set and giving more it training or tweaking the model itself while I work at my version. Though I am yet to see consistent results that would at least outstretch GAN128. I am pretty happy with its learning ability, but the result it generates is a little "low-fi". On a good side it doesn't create aberrations such as twisted lines out of nowhere (the major reason I started making this one over GAN128 trainer).

andenixa commented 6 years ago

Still tuning the Net. Memory consumption is modest even for middle-level cards. Speed is quite good, but I can't get a crisp picture even for decoder.

tjdwo13579 commented 6 years ago

I'm not trying to nitpick but is conversion not possible with this model?

I've tried adding the "-t OriginalHighRes" on the conversion code but it's not working.

It says: Reason: Error when checking : expected input_4 to have shape (None, 128, 128, 3) but got array with shape (1, 64, 64, 3)

Was this commit only meant for training as of now? Mind my ignorance.. I'm not an expert in this field

andenixa commented 6 years ago

@tjdwo13579 you might be right I have forgotten to add conversion code. I did a PR yet I weren't able to test it with the latest git version. Somehow new releases became less Windows path friendly specially if you are using SMB paths like in my case. @iperov I shall be sure to check it. Thanks.

tjdwo13579 commented 6 years ago

@andenixa Thanks for adding the conversion code! I'll try it out now.

andenixa commented 6 years ago

@iperov I shall try your interleaved Upscale/ResBlock approach on decoder if you don't mind. I also like the face extractor you are using. I want to create something akin to H128 yet maskless. I noticed you reduce memory consumption by using smaller batch sizes. Does it play well for diverse (different lighting condition) data-sets? I noticed bugger batches contribute to more accurate / generalized models. I wasn't able to refine anything with bs < ~45-48

andenixa commented 6 years ago

@iperov looks excellent. Do you think its possible to preserve TARGET faces details, freckles, perhaps through another special layer? Do you feel that additional conv layers (for Encoder) contribute to better detail preservation? I also want to try deep-deep approach with additional Dense+Dropout layer in the middle of Encoder.

iperov commented 6 years ago

@andenixa model experiments with result comparison are much welcome.

andenixa commented 6 years ago

Thanks to @iperov I am currently testing another revision of HighRes model adopting their re-scaling idea. Memory consumption is somewhat high though but you guys with 6 to 8Gb should be fine. Training speed is slower as there is much more deep layers in Encoder. When I get a model with somewhat good clarity I shall adjust it for more face coverage.

torzdf commented 6 years ago

I'll leave it open as a pr for now. Let me know when you think it's ready for merging.

andenixa commented 6 years ago

@torzdf sure, I am just trying to see if its not worse than the previous one considering decreased learning speed and raised memory demands. I am also trying a sliced bread design with dropout layer in the middle because previous 64x model (which is the basis for HighResv2) overtrained because of increased number of Conv2D layers. I shall credit the ideas I might have borrowed from other contributes of course. Generally I just want a working 128x tensor with HD quality;)

andenixa commented 6 years ago

Still working on the model. The clarity is fascinating now, but the target vectors sometimes match wrongly aligned faces. I am trying to reduce number of deep layers to see if it helps that but I shall leave the high clarity (very-deep) Encoder in the code as well for those who want to experiment.

andenixa commented 6 years ago

@torzdf I've updated my PR for the new model. It seems to be rather sane and stable. It takes some time to train and resource consumption is around 5gb per 24 batch_size. The clarity is rather good with a nice data-set. It seems to work for multi-gpu model as well.

iperov commented 6 years ago

@andenixa is SeparableConv useful ? what benefits it provides? have you comparison against regular idea with residual layers ?

andenixa commented 6 years ago

@iperov I think its slightly faster and less accurate with colors as it processes color layers separately (presumably sequentially). It consumes less memory though it probably has worse convergence in general. I try to squeeze more layers while having reasonable training speed and Ram requirements. Also ideally the first conv layers it has to be 2x the retina side size yet I think it's unfeasible with Conv2D memory wise. If you can fix a proper 128x HalfFace using the regular Conv2D I'd appreciate it.

PS: The reason I can't use OpenFaceSwap it's not compatible with current training sets and I have a lot of manually crafted sets.

torzdf commented 6 years ago

Ok, I haven't got time to test this at the moment, but I will merge it into staging.

If anyone wants to checkout the staging branch, give it a go and report back their findings that would be appreciated.

iperov commented 6 years ago

@andenixa I made best H128, without suxx residual blocks. I removed res blocks from all models. New super update for all models upcoming...

andenixa commented 6 years ago

@iperov sounds fascinating if you can make it happen. In fact perhaps we should aim for H256 next. I very excited to give your H128 a try just need a time to time to make a training set.

Are H128 considerably different from full-face? For regular faceswap its just a matter of adjusting the margin matrix and of course training it to catch more "space". I actually changed new HighRes model to cover most of the face which is going to be in the next revision.

iperov commented 6 years ago

H128 has more details vs full face 128, but doesnt cover one cheek and beard. Half face good for women fakes whose cheeks occluded by hair.

andenixa commented 6 years ago

@iperov I am not exactly aiming to create fakes but rather to have one-to-many model where I merge multiple faces in the target data-set to catch unique features of each face. I have been successful with the basic Model by adding extra Conv layer(s) and increasing neuron count at dense layers. The problem of poor generalization and over-fitting still persists. It needs some learning rate decay and a lot of training epochs and still sucks quality-wise. The major problem is also that the approach faceswap uses puts too much emphasis at matching the color rather than shape which makes it difficult to "melt" multiple sets.

iperov commented 6 years ago

then what you doing in face swap repo ?

andenixa commented 6 years ago

@iperov faceswap serves my purpose to some extent. It also doesn't have any working 128 model thus though I could provide one. Still not sure if my "concoction" works good enough (though its gotten much better now). Perhaps you could donate some of your code to create a basic H128 with decent quality and speed for faceswap repo.

iperov commented 6 years ago

@andenixa I am making avatar model too. Any thoughts about layers config ?

iperov commented 6 years ago

h128_cage_0

andenixa commented 6 years ago

@iperov if you mean one-to-many I have several thoughts but it would need testing.

It has to be a full-face for sure. For the net I would add one or more Conv layers to cover more face specific features and increase number of Neurons at the Dense layer proportionally. Perhaps by 25%-50% per additional Convolution layer depending on how diverse the training set. If you know the approximate number of features you expect adding a Pool with Dropout helps at least according to some articles.

Target faces might benefit from additional pre-processing.

Slightly de-saturate the samples to make them less colorful. That reduces net preference by color instead of shape. Perhaps you might know a better solution for that.
Normalize the luminance / brightness to equalize lighting conditions.
Increase random transformation by a factor of 1.1 - 1.5

PS: Not sure how Keras training optimizers are exactly managing its learning rate but I found that reducing it slightly at higher epochs helps the convergence. Adam optimizer has learning rate decay but I think its done by subtracting a specified value each epoch which isn't very helpful. Learning rate should decrease non linearly. I would set several thresholds for that for the lack of a better solution.

iperov commented 6 years ago

I pushed big update of OpenDeepFaceSwap. Quality significantly increased.

about avatar model, I mean 64 half face matching 256 avatar. 00000

andenixa commented 6 years ago

@iperov thanks for the heads-up. I shall check it out.

Guys we should seriously stop using presidents for our testing purposes before they pass legislations banning the use or development of any face swapping technologies.

andenixa commented 6 years ago

@iperov absolutely fantastic results and the approach to remove res blocks as well! Could you aim for H256 which I think it always was about? You could try SepConv2D to reduce memory consumption.

iperov commented 6 years ago

@andenixa not enough ram even for 128. It is like 'low mem' model. So may be after 2 years with new videocards we will try 256. or as an option to train on CPU 40-80 days :D

andenixa commented 6 years ago

@iperov but your approach works brilliantly and it is scalable. I am not sure to what extent mask is really necessary but without a mask H256 could fit to GPU.

iperov commented 6 years ago

I dont think so. 256 = 128 x 4 times bigger.

also mask very important. Model predicts mask smoothly instead of using source frame mask which much jittering.

andenixa commented 6 years ago

@iperov Do you want to contribute your idea to faceswap repo? I can adapt OriginalHighRes model to use the mask, but I can't release it without your permission as its going to be pretty derivative on your work.

iperov commented 6 years ago

@andenixa

Do you want to contribute your idea to faceswap repo?

no :) Before forking I was contributing to faceswap. It is time wasting.

andenixa commented 6 years ago

@iperov I see, but I still can use it for my personal use, right?

The thing with faceswap is that it has wider audience specially with the emergence of the GUI which is very important for "layman".

iperov commented 6 years ago

of course you can use it. Why opensourcing then? :)

My windows binary has convenient .bat files and manual in Russian, so I dont need GUI absolutely.

andenixa commented 6 years ago

@iperov but I can't release the model that uses your masked approach right? Just to be clear.

(actually I think I am going to try 256x256x3 vectors, I have a feeling it could be squeezed somehow at least to 8GB GPUs)

iperov commented 6 years ago

masked trainer idea I got from dfaker repo.

iperov commented 6 years ago

I trying to make avatar model last 8 hours, but no success. python_2018-05-16_17-37-48

andenixa commented 6 years ago

@iperov what is an avatar model. How is it different from the regular one?

iperov commented 6 years ago

half face controls an avatar. Goal is B half face will control A avatar

andenixa commented 6 years ago

@iperov do you think if less bits are used for colors, perhaps 15 bits it could be manageable to fit 256 vectors? Or utilize 128->256 dfaker approach (it uses 64->128 I think).

torzdf commented 6 years ago

@andenixa https://github.com/deepfakes/faceswap/blob/master/LICENSE

It's all opensource, just credit where relevant.

deepfakes / faceswap

Original Model 128 worth it? #385