Seth-Park / comp-t2i-dataset

Dataset splits and evaluation code for the paper "Benchmark for Compositional Text-to-Image Synthesis" (NeurIPS 2021)
45 stars 0 forks source link

Implementation details #1

Open yrcong opened 2 years ago

yrcong commented 2 years ago

dear authors,

It's a great work:) Well, i would like to ask something about the implementation details.

I see DMGAN, ControlGAN and DFGAN are retrained on the new data split. These models are trained with different settings in their original papers (e.g. learning rate, epochs...). How do you select the training settings/hyperparameters when you retrained them:)

Seth-Park commented 2 years ago

We use the same set of hyperparameters used in the original papers for each model.

yrcong commented 2 years ago

We use the same set of hyperparameters used in the original papers for each model.

The dataset size is changed (for example CUB-Color should have fewer training images than the original CUB). Have you also adapted the training epoch number?

yrcong commented 2 years ago

I also encountered a problem when I was trying to reproduce DAMSM R-precision (for C-CUB-Color).

In the original AttnGAN, there are 5450 tokens. I downloaded the pretrained text encoder provided by you. There are also 5450 token embeddings.

However, I cannot tokenize the C-CUB-Color captions. For example the caption "this bird is black in color, with a black beak.", "." is not included in the word dictionary of the DAMSM module. How did you process this case?

yrcong commented 2 years ago

For test_unseen split i got an error when generating with the image_id 191.Red_headed_Woodpecker/Red_Headed_Woodpecker_0018_183455 and caption_id 6 "bird has black and white striped wings and back, a medium length narrow dark-colored bill, a reddish orange crown, light orange shading in its face surrounding its beak and dark eyes, a mainly white breast with some dark speckles and light orange band on the bottom, and a tail with a black and white striped inner rectrices and solid black outer rectrices."

It is too long and CLIP doesn't support this. What should I do for this?