davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++
http://dlib.net
Boost Software License 1.0
13.53k stars 3.37k forks source link

Example of DCGAN #1776

Closed arrufat closed 4 years ago

arrufat commented 5 years ago

Hi, I would like to contribute a DCGAN example to dlib.

I have implemented a version of Pytorch DCGAN for C++.

However, I would need some guidance with some things I don't know how to do. I am wondering on how I should proceed. Should I attach my current code here (around 150 lines), or make a pull request, even if the code is not able to learn anything? Maybe @edubois can help out, since he stated that he managed to make it work on https://github.com/davisking/dlib/issues/1261

Thanks for your hard work on dlib.

dlib-issue-bot commented 5 years ago

Warning: this issue has been inactive for 35 days and will be automatically closed on 2019-07-05 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

arrufat commented 5 years ago

I'm still working on this... I'm going to write about the network architecture

Here's what they look like:

// convolution and transposed convolution with custom padding
template<long num_filters, long kernel_size, int stride, int padding, typename SUBNET>
using conp = add_layer<con_<num_filters, kernel_size, kernel_size, stride, stride, padding, padding>, SUBNET>;
template<long num_filters, long kernel_size, int stride, int padding, typename SUBNET>
using contp = add_layer<cont_<num_filters, kernel_size, kernel_size, stride, stride, padding, padding>, SUBNET>;
// the generator
template<typename SUBNET>
using generator_type =
    htan<contp<1, 4, 2, 1,
    relu<bn_con<contp<64, 4, 2, 1,
    relu<bn_con<contp<128, 3, 2, 1,
    relu<bn_con<contp<256, 4, 1, 0,
    SUBNET>>>>>>>>>>>;
// the discriminator
template<typename SUBNET>
using discriminator_type =
    loss_binary_log<
    affine<conp<1, 3, 1, 0,
    prelu<bn_con<conp<256, 4, 2, 1,
    prelu<bn_con<conp<128, 4, 2, 1,
    prelu<conp<64, 4, 2, 1,
    SUBNET>>>>>>>>>>>;
// and the whole network
using net_type =
    discriminator_type<
    // tag5 gets the mask from tag1, multiplies it by the noise and adds it to the image
    tag5<add_prev2<mult_prev4<extract<input_size * input_size * 2, 1, input_size, input_size, skip1<
    // tag4 generates an image from the noise
    tag4<generator_type<
    // tag3 gets the noise from tag1
    tag3<extract<input_size * input_size, 100, 1, 1, skip1<
    // tag2 gets the image from tag1
    tag2<extract<0, 1, input_size, input_size,
    // tag1 contains the image, the noise and a mask
    tag1<input<std::array<matrix<float>, 3>>>
    >>>>>>>>>>>>>;

I know the architecture looks a bit weird, but the main idea is that the input has 3 channels:

  1. a real image
  2. random noise
  3. a mask (zeros or ones)

Then, when I train with real images or random noise I set the mask to ignore the input I don't care about. The main problem I face is when I want to back propagate the error with real images, where the discriminator is not involved, I don't know how to stop the back propagation in tag5. Maybe using visit_layers_until_tag, but I haven't managed to make it work. By the way, the network trains a pretty good discriminator, which after a few iterations has a loss of around 10e-9, but the generator sucks...

Any help or guidance is appreciated, but I will continue digging :)

davisking commented 5 years ago

You might be better off with two separate net objects and to alternate between training them.

arrufat commented 5 years ago

Thanks for the suggestion, that was my first approach, but I couldn't make it work. I'll give it another try :)

Cydral commented 5 years ago

Hi @arrufat, have you please finally published a GAN network using DLIB on your GitHub home page? Btw, thank you for the definition of the Resxxx networks you give, it is very useful and I am currently computing some models that could be useful to the community (e.g. gender and age). I will submit them to @davisking if the results obtained are interesting, in order to enhance the current framework.

arrufat commented 5 years ago

@Cydral, yes, that's the whole point of this, I want to make a working DCGAN example and share it with everyone, but first, and most importantly, I need to get it working, which doesn't seem trivial. As soon as I got something, I'll update this issue. Also, thanks for your kind works on my ResNet implementations. I want to simplify the code a little bit, mostly by defining the models as templates that depend only on bn_con or affine so that I don't have to duplicate models for training and inference. Here's what I have in mind, but still thinking about it:

namespace resnet50
{
    // resnet backbone definition goes here
    template<template<typename> class BN>
    using model = loss_multiclass_log<fc<1000, backbone<BN, input_rgb_image>>>

    using train = model<bn_con>
    using infer = model<affine>
}

Then you could use it in your own code like:

resnet50::train net;

or

resnet50::infer net;
Cydral commented 5 years ago

Hi @arrufat, it does indeed sound good and it may actually avoid overwriting a model with the part as it has happened to me in the past! For the models I would like to propose, I work more on layer reduction, the idea being to have so-called minimalist but nevertheless effective models. I already did lots of tests using Resnet-18 and I think I will also try a Resnet-12 type, also by minimizing the size of the input image as much as possible.

For the DCGAN, I considered what you reported and found the approach interesting but reinjecting the loss value into the network seems difficult in this manner.

On my side, I'm trying to see if the loss function (e.g. loss_multiclass_log_per_pixel) could not be modified to include a network (discriminator) on the one hand and if the output value of this network could simply not be returned by the loss function for the weight adjustment, on the other hand. According to the Dlib implementation, because the loss function has both the current/target image and the input image, training the discriminator at this level should be possible, allowing to maintain the current loss back-propagation mechanism. Does that seem appropriate?

arrufat commented 5 years ago

That seems an odd choice, but maybe not weirder that my joint network architecture, let's see how it turns out.

Cydral commented 5 years ago

I saw Davis' comment and it is certain that building two separate networks and alternating learning and inference (for the discriminator) by forwarding the output value to the generator is certainly easier. I'm still looking for a way to do that by reusing Dlib's existing primitives but it's not so easy for me. At least, we have a U-Net model that can probably be reused for the generator (https://github.com/davisking/dlib/blob/master/examples/dnn_semantic_segmentation_ex.h)...

arrufat commented 5 years ago

Here's my work in process for the DCGAN implementation with two separate networks, but it doesn't work (yet). It's what I tried first, before merging them both. https://gist.github.com/arrufat/062b8847b7f87465efd96d627dadf1ad

Cydral commented 5 years ago

Thanks for sharing, my mate! I leave a comment attached to this code.

Cydral commented 5 years ago

Hi @arrufat , I updated the code here: https://gist.github.com/Cydral/92be4e848551429ec1a6919d6d813c08.

I used another approach for the formalization of the G and D networks but at the end, it's very close to your own code. This seems to work overall... except for the back propagation of the loss tensor values. Maybe could @davisking really advise us on that?

By the way, it would only work for a single plane for the moment; I had initially made a version to infer a RGB image but I have a problem to get a 3D matrix formalizing a reconstructed image from the generator outputs. Maybe it is necessary to extract each scalar values of k and rebuild a RGB image from these different planes (I haven't tried yet)?

davisking commented 5 years ago

If you are trying to backprop from one network into another you want to get the gradients via get_final_data_gradient(), not get_gradient_input(), since you want the gradients with respect to the inputs to the network, which is not what get_gradient_input() is giving you. get_gradient_input() is the input to the backprop procedure for a network. It's not an output of the network.

On Thu, Jul 25, 2019 at 4:10 AM Cydral notifications@github.com wrote:

Hi @arrufat https://github.com/arrufat , I updated the code here: https://gist.github.com/Cydral/92be4e848551429ec1a6919d6d813c08.

I used another approach for the formalization of the G and D networks but at the end, it's very close to your own code. This seems to work overall... except for the back propagation of the loss tensor values. Maybe could @davisking https://github.com/davisking really advise us on that?

By the way, it would only work for a single plane for the moment; I had initially made a version to infer a RGB image but I have a problem to get a 3D matrix formalizing a reconstructed image from the generator outputs. Maybe it is necessary to extract each scalar values of k and rebuild a RGB image from these different planes (I haven't tried yet)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/davisking/dlib/issues/1776?email_source=notifications&email_token=ABPYFR23EJ3P6ND4RDMR4ZTQBFNWJA5CNFSM4HOKEY5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2YWVNQ#issuecomment-514943670, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPYFR7EGAFA7ALMYKOBPT3QBFNWJANCNFSM4HOKEY5A .

Cydral commented 5 years ago

Thank you for your advice. We will continue our work on this basis.

dlib-issue-bot commented 5 years ago

Warning: this issue has been inactive for 35 days and will be automatically closed on 2019-09-11 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

arrufat commented 5 years ago

Sorry to bother, but I've been reading the documentation and trying several things, and when I try to backprop from one net into the other:

dis_trainer.train_one_step(mini_batch_fake_samples, mini_batch_fake_labels);
resizable_tensor loss_fake = discriminator.subnet().get_final_data_gradient();
generator.subnet().back_propagate_error(loss_fake);

I get this:

Error detected at line 662.
Error detected in file _deps/dlib-src/dlib/cuda/cudnn_dlibapi.cpp.
Error detected in function void dlib::cuda::batch_normalize_conv_gradient(double, const dlib::tensor&, const dlib::tensor&, const dlib::tensor&, const dlib::tensor&, const dlib::tensor&, dlib::tensor&, dlib::tensor&, dlib::tensor&).

Failing expression was src.k() == (long)means.size().

I know it's my fault and I am using the API incorrectly, but I can't figure out how to do it properly...

dlib-issue-bot commented 5 years ago

Warning: this issue has been inactive for 35 days and will be automatically closed on 2019-10-18 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

arrufat commented 5 years ago

After reading the documentation more thoroughly, I think I've found a way to properly backpropagate the loss from one network to the other.

// get the loss after inputting images from the generator using noise samples
const resizable_tensor& out_fake = discriminator.subnet().subnet().get_final_data_gradient();
// convert the input noises into a tensor by using the generator input layer (13, in my case)
resizable_tensor noises_tensor;
layer<13>(generator).to_tensor(noises.begin(), noises.end(), noises_tensor);
// make a forward call
generator(noises)
generator.subnet().subnet().back_propagate_error(noises_tensor, out_fake);

All this works until the network is serialized:

Error detected at line 1522.
Error detected in file _deps/dlib-src/dlib/cuda/cudnn_dlibapi.cpp.
Error detected in function void dlib::cuda::tanh_gradient(dlib::tensor&, const dlib::tensor&, const dlib::tensor&).

Failing expression was have_same_dimensions(dest,gradient_input) == true && have_same_dimensions(dest,grad) == true.

However, I don't know if there might be a more straightforward way to backpropagate the loss from one network to the other. Suggestions are welcome :)

davisking commented 5 years ago

back_propagate_error() requires that have_same_dimensions(gradient_input, get_output())==true. Are you sure that out_fake has the same dimensions as generator.subnet().subnet().get_output()?

arrufat commented 5 years ago

Thanks for the reply. I've just verified and they have the same size:

const resizable_tensor& out_fake = discriminator.subnet().subnet().get_final_data_gradient();
std::cout << "out_fake: " <<
              out_fake.num_samples() << "x" <<
              out_fake.k() << "x" <<
              out_fake.nr() << "x" <<
              out_fake.nc() << std::endl;

const resizable_tensor& out_gen = generator.subnet().subnet().get_output();
std::cout << "gen_fake:  " <<
              out_gen.num_samples() << "x" <<
              out_gen.k() << "x" <<
              out_gen.nr() << "x" <<
              out_gen.nc() << std::endl;

the output is:

out_fake: 8x1x28x28
out_gen:  8x1x28x28

I will update the code to my github fork soon.

arrufat commented 5 years ago

You can find a compilable example here.

reunanen commented 5 years ago

You can find a compilable example here.

Thanks for sharing the code – I too am very excited about this new development!

Looks to me like synchronizing the network being trained to disk may empty out_fake = discriminator.subnet().subnet().get_final_data_gradient().

With this small sanity check, was at least able to train much longer: https://github.com/reunanen/dlib/commit/ac6d8e7fad329d5a75c8619b710aa1832b870d92

reunanen commented 5 years ago

I guess this line is emptying also the gradient: https://github.com/arrufat/dlib/blob/592361ed818d04fd1f52522757e593a6bb10b550/dlib/dnn/trainer.h#L1009-L1010

arrufat commented 5 years ago

So, I've realized that I might have been back-propagating the error without updating the network parameters. I've updated the code to perform these actions:

// convert the noises array to a tensor
resizable_tensor noises_tensor;
generator.to_tensor(noises.begin(), noises.end(), noises_tensor);
// forward it to the network
generator.subnet().forward(noises_tensor);
// back-propagate the error with the loss from the discriminator
generator.subnet().subnet().back_propagate_error(noises_tensor, out_fake);
// update the network parameters using the the discriminator network solvers
auto solvers = gen_trainer.get_solvers();
generator.subnet().subnet().update_parameters(make_sstack<adam>(solvers), gen_trainer.get_learning_rate());

It's not working yet, but I'm not 100% I'm doing it right. Anyway I feel we're getting closer :)

davisking commented 5 years ago

Oh yeah, that part is important :)

arrufat commented 5 years ago

With the latest update, I've managed to speed up the training and generate images like this one:

img_1

But I'm running out of ideas on how to make it work well... right now I can't see what I'm doing wrong...

arrufat commented 4 years ago

Unluckily I haven't been able to make this work... I've read the docs several times and I can't see what I am doing wrong. If somebody has some spare time to look at the code, I would really appreciate it :)

davisking commented 4 years ago

It's very possible it's not a software issue. Take it from someone who has attempted to reproduce many many papers, it's often hard and often things don't work the way paper authors suggest. There are often important tricks they leave out that are needed to make things work right, or the method is very narrow and only works on the data they used, rather than on more general stuff, despite implicit suggestions to the contrary in the paper. You should start out by trying to exactly reproduce some published setup, so same data and settings and everything, if you can.

On Sun, Nov 10, 2019 at 3:43 AM Adrià Arrufat notifications@github.com wrote:

Unluckily I haven't been able to make this work... I've read the docs several times and I can't see what I am doing wrong. If somebody has some spare time to look at the code, I would really appreciate it :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/davisking/dlib/issues/1776?email_source=notifications&email_token=ABPYFR7GMJMSHAQ42GKA2NLQS7CTXA5CNFSM4HOKEY5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDUYKOA#issuecomment-552174904, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPYFR23LVDWFSNU5FVPPCTQS7CTXANCNFSM4HOKEY5A .

arrufat commented 4 years ago

Thank you for your suggestion. I have also some experience implementing papers and many times there are some tricks that the authors missed (on purpose?), so making it work is not usually straightforward. For this reason I wanted to implement the DCGAN code from the PyTorch CPP examples. I am using same the same dataset, optimizer, parameters and batch size. I will debug the PyTorch example more thoroughly to make sure I didn't miss any detail.

However, I would say that I'm not sure I am updating the network appropriately. I did it the way I understood from the documentation and how it's used in dlib itself, but I've never seen it being used to update a network with gradients coming from another one.

I will let you know if I make some progress, so that other people that want to create a GAN using dlib don't have to struggle too much :)

intellizd commented 4 years ago

First of all, thank you for your contribution. there were some fixes. add leakyRelu, Added BinaryCrossEntorpyLoss function, modified DCGan Training Sequence.

    using generator_type =
        loss_binary_cross_entropy < fc_no_bias <1,
        htan<contp<1, 4, 2, 1,
        relu<bn_con<contp<64, 4, 2, 1,
        relu<bn_con<contp<128, 3, 2, 1,
        relu<bn_con<contp<256, 4, 1, 0,
        input<noise_t>
        >>>>>>>>>>>>>;
    using discriminator_type =
        loss_binary_cross_entropy <fc_no_bias<1,
        conp<1, 3, 1, 0,
        leakyrelu<bn_con<conp<256, 4, 2, 1,
        leakyrelu<bn_con<conp<128, 4, 2, 1,
        leakyrelu<conp<64, 4, 2, 1,
        input<matrix<unsigned char>>
        >>>>>>>>>>>;

And if you look at the pytorch c ++ code, you will finally learn the final gradient of the generator in the discriminator. Rather than learning the final gradient for the Discriminator's training, it's important to learn the gradient for the Test_loss. this code is

dis_trainer.test_one_step(mini_batch_fake_samples, mini_batch_fake_labels);
resizable_tensor noises_tensor;
layer<13>(generator).to_tensor(noises.begin(), noises.end(), noises_tensor);
generator(noises);
const resizable_tensor& out_fake = discriminator.subnet().subnet().get_final_data_gradient();

above this is very important. I'll clean up the source ASAP and put it on my repositorie.

Thank you for your contribution. Let's continue to develop Dlib together.

this is result of mnist dcgan. DCGAN

intellizd commented 4 years ago

I just updated the dcgan example code in my repository

Please contact me if you there is a problem!

example code is https://github.com/intellizd/dlib-dcgan_example/blob/master/examples/dnn_dcgan_train_ex.cpp

dcgan-dlib whole source is https://github.com/intellizd/dlib-dcgan_example

thanks

davisking commented 4 years ago

Awesome. Super cool to see this working :)

arrufat commented 4 years ago

Agreed, thank you so much for looking into this @intellizd Will you submit a PR? I think you should :)

intellizd commented 4 years ago

I usually use the dlib platform frequently. This time I'm glad to contribute because of the dcgan issue. This time it's a mnist dcgan sample, but I'd like to try various new things through dlib. I want to be a contributor to various features of dlib.

First I want to develop mnist GAN to cifar GAN. Davis, I would appreciate it if you consider me a contributor.

davisking commented 4 years ago

@intellizd that would be awesome :)

You should make the mnist DCGAN thing into a little example so other people can learn from it. That would be super cool and educational for many users.

intellizd commented 4 years ago

@davisking ,Thank you for the compliment.

I learned a lot from this DC-GAN issue. thank you to @arrufat for making this issue. I recommend that DLIB draw up a development road-map like any other Deep Learning Frameworks. Why don't we develop that roadmap together with contributors who are willing to participate?

davisking commented 4 years ago

We can make a bunch of issues that are "help wanted". There are already a few. I generally encourage others to work on whatever interests them. But some obvious things to do are to add new layer types like dilated convolution. There are a few things like that that have been added to cuDNN but not yet made part of dlib's layer catalogue. Burning down that list is a good place to start.

intellizd commented 4 years ago

@davisking thanks for your comment about dlib's direction. I'll follow your direction.

Cydral commented 4 years ago

I just updated the dcgan example code in my repository

Please contact me if you there is a problem!

example code is https://github.com/intellizd/dlib-dcgan_example/blob/master/examples/dnn_dcgan_train_ex.cpp

dcgan-dlib whole source is https://github.com/intellizd/dlib-dcgan_example

thanks

Are you sure of the part of code: resizable_tensor noises_tensor; layer<13>(generator).to_tensor(noises.begin(), noises.end(), noises_tensor); generator(noises); const resizable_tensor& out_fake = discriminator.subnet().subnet().get_final_data_gradient(); ... because networks are different: out_fake: 128x1x96x96 noises_tensor: 128x100x1x1 In such a situation, an assert is normally raised: Failing expression was have_same_dimensions(dest,gradient_input) == true && have_same_dimensions(dest,grad) == true

intellizd commented 4 years ago

@Cydral That's right. Perhaps the generator and discriminator layers should be paired and matched. such like generator's htan <-> discrimminator's conp

Have you changed layer structure? If you changed it, show me the structure.

Cydral commented 4 years ago

@intellizd, not really. I used the definition to align my own code with fixes you reported previous as below; `using generator_type = loss_binary_log<fc_no_bias<1, htan<contp<1, 4, 2, 1, relu<bn_con<contp<64, 4, 2, 1, relu<bn_con<contp<128, 3, 2, 1, relu<bn_con<contp<256, 4, 1, 0, input

; using discriminator_type = loss_binary_log<fc_no_bias<1, conp<1, 3, 1, 0, relu<bn_con<conp<256, 4, 2, 1, relu<bn_con<conp<128, 4, 2, 1, relu<conp<64, 4, 2, 1, input<matrix> ;` I only set the image size to 96pix.

In a previous comment, you reported:

dis_trainer.test_one_step(mini_batch_fake_samples, mini_batch_fake_labels); resizable_tensor noises_tensor; layer<13>(generator).to_tensor(noises.begin(), noises.end(), noises_tensor); generator(noises); const resizable_tensor& out_fake = discriminator.subnet().subnet().get_final_data_gradient();

According to the network definition, layer<13> means the entry level in your own code. Thus my remark because the entry level definition of the generator doesn't match the out_fake structure. Thus my question. Could you please check the code you posted to confirm that the right version used to generate images as results of the of the mnist dcgan run? Thanks in advance.

intellizd commented 4 years ago

The corresponding layer is the architecture for the minist data. Only size for images of 28*28 pixels. You can change the image size and try again

Cydral commented 4 years ago

Sorry, I didn't get the point. Why will this work with a 28x28pix image because, except if I'm mistaken, we will try to use the final data gradient of a num_samplesx100x1x1 (z latent layer) network to backpropagate it inside a num_samplesx1ximage_sizeximage_size? BTW, I tested also directly with your own code and I have an interruption during the second loop, coming from the discriminator training... I'm trying to figure out why

Cydral commented 4 years ago

Sorry, you can forget my question. This is clear but I had not realized that the latent vector finally gave input_size=1. The output size is therefore independent of the size of the input vector in this case.

dlib-issue-bot commented 4 years ago

Warning: this issue has been inactive for 35 days and will be automatically closed on 2020-01-20 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

arrufat commented 4 years ago

@intellizd are you planning on submitting a PR with an example on how to train these kind of networks? If not and if you don't mind, I might give it a go :)

intellizd commented 4 years ago

@arrufat mnist and cifar10 dcgan were completed with CUDA version. I'm the first PR at DLIB. BINARY_CROSS_ENTROPY was added to the CUDA version at Dlib's Loss Fuction. I'll try as soon as possible if I can get PR. I was busy these days so I didn't get busy these days. Thanks for your advice.

isgursoy commented 4 years ago

following

intellizd commented 4 years ago

@arrufat Dlib-DCGAN Repository: https://github.com/intellizd/dlib-dcgan_example just now updated dcgan examples (MNIST, CIFAR 64x64) CUDA version binary cross entropy and 64*64 size image DC-GANs sample was added. Because I'm a novice at PR. I hope you will be the leader and this PR(DCGAN-examples) Optimize and PR please.

arrufat commented 4 years ago

@intellizd Oh great, thanks for contributing to this! I will definitely have a look :) However, it seems that instead of forking dlib's repository, you copied the dlib tree into a new repository. I'll try to fix that :)