NVlabs / imaginaire

NVIDIA's Deep Imagination Team's PyTorch Library
Other
4.01k stars 450 forks source link

initial weight type is None?? #40

Closed Johnson-yue closed 2 years ago

Johnson-yue commented 3 years ago

https://github.com/NVlabs/imaginaire/blob/bac04f520b55cad8917ed17badb27d98f9242029/configs/projects/coco_funit/animal_faces/base64_bs8_class149.yaml#L20

Hi, when I trained with coco_funit, I see the result is very bad, it is all zeros out. and I check the network I saw the initial weight type is None ,is that true ??

mingyuliutw commented 3 years ago

Yes, for COCO-FUNIT, we were using the None init_weight_type. It just means that we use the default pytorch init weight. We will add a comment there for clarification.

When you said the results were very bad, were you applying the algorithm to a new dataset or an existing dataset?

Johnson-yue commented 3 years ago

Yes, for COCO-FUNIT, we were using the None init_weight_type. It just means that we use the default pytorch init weight. We will add a comment there for clarification.

When you said the results were very bad, were you applying the algorithm to a new dataset or an existing dataset?

I applying to a new dataset , and the log is here: tb_log the D_loss very quick to zero and output all zeros

ksaito-ut commented 3 years ago

Hi, I am Kuniaki, a co-author of COCO-FUNIT. Thanks for your interest in our work!

Let me guess three possible causes.

  1. Dataset We collected many classes and images. I am not sure whether your new dataset has similar scales in classes and images. If it does not work well on animal faces or animal datasets we have, there may be an issue in implementation. From your training curve, the training looks ok until 40k iteration. From my experience, I saw an issue in much earlier iterations when there was a fundamental issue in implementation. And, the trend is similar to when I tested in a noisy dataset (e.g. bird dataset with many outlier images). You may need to clean the datasets more carefully.

  2. Need to tune some hyper-parameters. You may need to change some hyper-parameters for the new datasets. The hyper-parameter we set was for our datasets and we used 8GPUs for training each model. If you use a different configuration, you may need to change hyper-parameters.

  3. You missed some important options such as spectral norm. The use of spectral norm is important. I guess you are already using it, so it may not be a cause.

I think 1 or 2 are more probable than 3. Thanks!

Johnson-yue commented 3 years ago

@ksaito-ut Hi, thank your reply. 1.Dataset I used the dataset is font dataset which contain 200 styles and 5000 characters every style , it is very balance, but only have 1 channels

  1. tune some hpyer-parameters I trained this dataset on 1 GPU , so I remove apex , only using full-float tensor to train it . and because my dataset is simply than animal faces (just I think), I reduce the parameters of the net。and I test many parameters but all not work

  2. This curve, I think the D is easy to discriminate real and fake, so the G not update anymore. Do you have any suggestions?

  3. Would you update your train curve for training animal faces dataset as reference ?

thank you

ksaito-ut commented 3 years ago

@Johnson-yue Thanks for the information.

  1. If I understand correctly, your dataset has 5000 characters with different fonts, you define fonts as styles and aim to manipulate the font.

I am not sure whether COCO-FUNIT's architecture design is suitable for the task. Because a model needs to change the shape of the input character. In our task, we try to maintain the pose of animals while attaching style information such as texture, color. That is why we used AdaIN layers, which manipulate the style information. The "pose of animals" will correspond to the font in your task. So, I think the task you are trying to solve is a little different from ours.

I am not familiar with the font-manipulation task, but, from my impression, you may need to rethink the design of the generator, especially, style-transfer part.

  1. Batch-size was important. So, single-GPU training may not be enough to ensure performance.

  2. Maybe the cause is the design of the generator's architecture as I mentioned.

  3. I do not have access to the training curve now. Do you have it, @mingyuliutw ?