alex-sage / logo-gen

Accompanying code for the paper "Logo Synthesis and Manipulation with Clustered Generative Adversarial Networks"
MIT License
86 stars 25 forks source link

Using the interpolate function in Vector.py #14

Open peterkentish opened 5 years ago

peterkentish commented 5 years ago

Hi,

I am trying to get my head round how the interpolate function works exactly.

z_start and z_stop I understand, these are the two vectors we are interpolating between.

The y_start and y_stop functions however, less so. If I just want to interpolate between the latent space and ignore the class labels, I would have thought the command is:

z = vec.gen_z()
vec.interpolate(z_start=z[0],z_stop=[1])

However I end up raising the error in the sample_z function.:

 No constant label set, please set first or call this function with parameter y

I've experimented about a bit but I cannot seem to get the function to work without a y parameter, and even then I am not sure what format this should be in.

Do you mind explaining the correct use of this?

alex-sage commented 5 years ago

Ok, so what exactly do you mean by "ignoring the class labels"?

A generator that has been trained using class labels necessarily needs them to generate an image - you cannot simply ignore them.

One thing you can do if you don't want to interpolate in-between different classes but only in latent space, is to keep the label constant. For this, you can use vec.set_y(label) where label is the label number you want to set as a constant label. The vector object then saves this class label internally, so that you don't need to specify any labels from now on.

If you didn't set this constant label and just try to interpolate without specifying a class label, the generator doesn't know what to do and raises the error you mentioned.

Another way to use it would be to specify the same class label twice, both as start and stop label. This would have exactly the same effect, the possibility of setting a constant label is purely for convenience.

Hope this helps.

alex-sage commented 5 years ago

Ah, one more important detail: While set_y(label) takes the label as an integer, show_z(), sample_z() and interpolate() all take the label as a one-hot encoded vector. To get this vector, you can use y = gen_y(label) where label is again you integer class number as used for set_y(label) and y is the one-hot encoded vector you can use e.g. as y_start and y_stop parameters in interpolate.

This is because the generator takes one-hot encoded labels, and it allows you to interpolate in-between different classes as well (and directly sample the generator using these intermediate labels, which is not possible when using integers).

Maybe there is a better way to design this interface so that using it is a bit more intuitive. I tried to design it in a way that gives you as much control as possible over what you're generating, allowing you to use your own interpolation functions, or arbitrary y-vectors if you like.

In short, when dealing with labels: In my code, a parameter called number always refers to an integer while y refers to a vector encoding this number.

peterkentish commented 5 years ago

Hi Alex,

Thanks for the speedy response. I guess this has opened up another question: How is it that the GAN uses the cluster labels? I thought the logo produced was pretty much defined by the z vector. I see in the paper it says the class labels we can view as additional cluster space.

I guess the root of this is that I am trying to recreate your vector arithmetic in the Appendices, and I am confused about where some information is encoded. It says you take 30 logos that are red and average the z vector to get a directional vector for moving towards red, however what if the red feature was encoded in the class label and not the z vector? Then our directional vector may not actually point towards red.

Thanks again

alex-sage commented 5 years ago

For normal GAN's you're right that the output is defined only by the z vector, however this is a conditional GAN that has been conditioned on the class labels to produce outputs that fit into the respective class for all z vectors, given a certain class label.

In Appendix A of the paper, I discuss exactly the issue that you encountered: It is indeed possible to create such a directional vector either purely in latent (z) space, only in the space created by the class labels or in the space of both combined. The information might indeed be encoded in both latent space and labels.

That being said, if you calculate the directional vector while keeping y constant (e.g. by using set_y()), you can be sure that it only operates in latent space with the information encoded in there.

peterkentish commented 5 years ago

Hi Alex,

Thanks again for your time and help. I really appreciate it.

When determining the directional vector, it says we can operate the same function to get the directional vector across the label space as we do across latent space. So for calculating the directional vector from blue to red, we would take the mean cluster label of the red logos and subtract it from the mean cluster label of the blue logos. But as these are one-hot vectors, I cant see how this will yield a useful directional vector. Would it not simply end up setting the label to one that generates red logos?

alex-sage commented 5 years ago

Hi Peter, sorry I seem to have overlooked this question of yours and only read it now. You might have figured this out in the mean time, but the gist of it is: The labels are one-hot vectors, but as my experiments show the network responds very well to vectors that aren't strictly one-hot. So you can give it a mixture of multiple labels and it actually produces a semantically meaningful result (which suprised me too!). This is also demonstrated when I interpolate in-between different labels in the paper. The fact that the labels are encoded as one-hot vectors what makes it possible to have mixtures of multiple labels, as obviously this would not be possible if the label was encoded e.g. as a single integer value.

So to sum up: It won't simply end up setting the label to one that generates red logos, because probably there isn't one that only consists of red logos (there might be a predominantly red one, and the calculated directional vector will in that case probably point in its direction, but this doesn't have to be the case). Instead it will try to create a mixture of labels that is closer to the properties you're looking for.