geek-ai / Texygen

A text generation benchmarking platform
MIT License
861 stars 202 forks source link

LeakGAN trains on test data! #22

Open optimass opened 5 years ago

optimass commented 5 years ago

Regarding the synthetic data experiment, the nll-test metric is computed with the gen_data_loader (training data) so nll-test is actually nll-train (useless metric). Highly misleading.

Also, even if you create a separate test set (test_file.txt), the function generator.get_nll calls some training updates. This function is used to compute nll-test. So even in this case, you would still be training on test data!

I suspect this error occurs in other models also ...

YongfeiYan commented 5 years ago

I found this problem too, in SeqGAN of real data(image_coco, emnlp_news) and MLE training. RelGAN may calculate the NLL using traning data instead of test data. see this