Question on vocabulary size of the Chinese poem dataset

geek-ai / Texygen

A text generation benchmarking platform

MIT License

863 stars 203 forks source link

Question on vocabulary size of the Chinese poem dataset #18

Open zl1zl opened 6 years ago

zl1zl commented 6 years ago

I'm trying to reproduce the Poem BLEU-2 result in the SeqGan paper, but I couldn't find out the vocabulary size used in the paper. In the RankGan paper, it uses a different dataset with size of 13,123 poems and filters out the words that occurs less than 5 times. Do you know the vocabulary size used in the SeqGan paper? Thanks a lot!

YongfeiYan commented 5 years ago

Hi, have you reproduced the results? I tried use all words in training data, but got BLEU-2 ~0.394 for MLE, which is lower than reported. Besides, what is the configuration of your SeqGAN model? lstm_hidden_size 32, emb_dim 32?