alex-sage / logo-gen

Accompanying code for the paper "Logo Synthesis and Manipulation with Clustered Generative Adversarial Networks"
MIT License
86 stars 25 forks source link

Prepare my own dataset #12

Closed kj-lai closed 5 years ago

kj-lai commented 5 years ago

Hi @alex-sage, I want to use your code to train a certain dataset, the icons are collected from the Noun Project. I want to ask if there is a documentation or codes to show me how to process the icons to the hdf5 file format.

alex-sage commented 5 years ago

Interesting, we where thinking of using the icons from the noun procect at some point as well. So I'd be very interested to see the outcome if your project!

However, I'm not quite sure what you're asking for, since how you need to process the icons of the noun project depends entirely on their data format which I'm not familiar with.

My code expects the HDF5 file to contain the same structure as described on the LLD Logo Dataset page. Most importantly the HDF5 file needs to contain a dataset called "data" containing the image data as an array of shape (N,C,H,W), where N is the number if images, C the image channels and H and W the height and width, respectively. The labels for the images should be contained in the same file under some path of your choice, this can be set as a parameter in the python script. These labels simply consist of an integer for each image in data which represents the class this image belongs to. Maybe you can just have a look at one of the downloadable datasets to see how they are structured.

You can of course use some completely different data structure if you want, and just adapt logo-gen/wgan/tflib/hdf5_images.py accordingly.

kj-lai commented 5 years ago

I was trying to follow the steps you had taken in processing the dataset. I was wondering how do you perform PCA dimensionality reduction after you perform the ResNet50 classification. My method is that I skipped the dimensionality reduction and directly performed KMeans clustering on the 2048 vector.

alex-sage commented 5 years ago

Oh, I didn't realize that you're trying to cluster the icons. Doesn't the Noun Project already provide labels which might be more relevant?

I used sklearn for the PCA dimensionality reduction.

kj-lai commented 5 years ago

Yes, Noun Project do have labels which comes from the search terms, but this could possibly lead to multiple label for a single icon. But my current project is just trying to train one class/label of icons, which is shopping cart icon and see how well the trained model can generate new icons (ideally unique icons). So the clustering will be dividing the icons based on shapes/styles (maybe), rather than on name of the object.

Last question, why do you use PCA to reduce the vectors when you could cluster them directly? To me, it seems to be an extra step. Does it improve the GAN's performance by involving PCA?

alex-sage commented 5 years ago

You're right, the PCA step is not strictly necessary. If I remember correctly, I mainly did it because clustering on 2048 dimensions took a long time and/or used a lot of memory. And as far as I know this is the standard approach for very high dimensional data, although I couldn't say for certain if the quality of the result is better either way. For me the results seemed qualitatively equivalent as far as I could tell.

kj-lai commented 5 years ago

Noted, thank you for your advices. I'll post some samples here once I have obtained some desirable results.

kj-lai commented 5 years ago

Hi Alex, this is the random samples of the icons generated. From what I can observe, the icons generated are mostly identical to the original icons and the rest are blurry. I followed the LLD-logo-rc_64 settings and the cost of the final checkpoint is about -64.3. What is the roughly cost values when training the GAN model using this setting?

random_02

I think that hyperparameter tuning is needed to get a better performance. It would be great if you can give some advice on this.

I have also checked on the tensorboard and the dev_disc_cost looks like the figure below. Is this normal? image