ZhenYangIACAS / NMT_GAN

generative adversarial nets for neural machine translation
Apache License 2.0
119 stars 37 forks source link

Clarification on files in config_discriminator.yaml #11

Closed kellymarchisio closed 6 years ago

kellymarchisio commented 6 years ago

I'm hoping for clarification on the files passed into config_discriminator.yaml.

As I understand it:

Is my understanding correct? Can you please provide clarification on how the files in the discriminator pretraining are created?

For context, I am trying to solve an issue where when running the discriminator pretraining, my loss falls to 2-3, but the accuracy always oscillates around 0.5 - even after 700K steps.

ZhenYangIACAS commented 6 years ago

@kellymarchisio Firstly, Thanks for your contributions to this code. Actually, your understanding is right. However, the vocabulary file is a little different from vocab.bpe.32000 released by en-de wmt corpora in the artificial tokens, such as "PAD", "S", "S" and "UNK". These tokens are utilized for preparing the training data. You only need to add these four tokens manually at the begin of the vocab.bpe.32000. I checked the log file in our experiments again, I find that in our experiments, the discriminator achieves accuracy with 0.7 after 2-epochs training.

jeicy07 commented 6 years ago

In my project, when running the discriminator pretraining, my loss also falls to 0-2, but the accuracy always oscillates around 0.5. I wonder what kind of problems may cause that? Thanks.

kellymarchisio commented 6 years ago

@jeicy07 In my project, the silly reason this behaviour was caused was because my pickled dictionary was built incorrectly. I fixed it to make sure it was a string:int mapping of word:id. When this was broken, my entire src/trg/neg matrices were written as 1s (for UNK). It sounds like the behaviour you observe is symptomatic of indistinguishable matrices. Try logging the final matrices you feed into the discriminator, to see if you observe anything unusual. Then backtrack from there.

jeicy07 commented 6 years ago

Thanks, I've tried to convert my pickled dictionary into a dict. Finally, it works!

kellymarchisio commented 6 years ago

@ZhenYangIACAS Thanks very much for your response. After fixing some errors, I also achieve accuracy 0.70 after 2 epochs. After how many epochs do you reach 0.82/0.95? I am still training (~epoch 4) but performance is still ~0.70.

I notice though that the accuracy bounces around quite significantly as seen here:

Is this expected, or a bug?

I also notice that loss alternates between very high values and lower values at the beginning of training:

Is this also expected, and what might cause this behavior? I would expect loss to monotonically decrease.

Thanks very much for releasing this code base - I've enjoyed working with it.

ZhenYangIACAS commented 6 years ago

@ @kellymarchisio I am sorry for late response. Your loss is so strange that it varies significantly ranging from the upper bound and lower bound. In our experiments, the loss should decrease smoothly. Have you shuffled your training data?

kellymarchisio commented 6 years ago

@ZhenYangIACAS Yes, the training data is being shuffled. Now on Epoch 5, the model has begun to overfit. The peak was ~0.71-0.72 in earlier epochs. This config I'm using is below. Does anything look amiss here?

The training accuracy is now 0.75-0.90 per batch, but dev accuracy stays 0.61-0.71, as it was in epoch 2, except in epoch 2 the performance was more consistent.

ZhenYangIACAS commented 6 years ago

@kellymarchisio There is no obvious error for your configuration.

kellymarchisio commented 6 years ago

@ZhenYangIACAS thanks for taking a look. To verify, should dis_positive_data, dis_negative_data, etc. look like regular sentences like:

Or do I have to pad the text file sent to the config so they look like:

I believe I've tried both, but your verification would be helpful.

ZhenYangIACAS commented 6 years ago

@kellymarchisio You do not need to add the padding to the files manually. The code will do it automatically.

kellymarchisio commented 6 years ago

@ZhenYangIACAS Thank you for the clarification. According to your paper, I won't be able to reproduce the GAN training unless I get 82% accuracy. A few quick related questions:

ZhenYangIACAS commented 6 years ago

I am sure that all of the parameters which show much effect on the translation performance are described detailed in our paper. We did not use pre-trained word embeddings. You can find the initialization method in our code. I remember that when we use the Transformer as the generator, the accuracy is hard to get to more than 90%. For your problem, it seems that a bug exists, but I am not sure.

ashwanitanwar commented 6 years ago

@kellymarchisio How did you find dev accuracy in discriminator? I am using transformer as a generator. Code for finding dev accuracy is commented in cnn_discriminator.py file. I used this code but it shows several values in validation accuracy and they vary a lot.

luckper commented 6 years ago

@kellymarchisio Hi,Can you show me your data sample in config_discriminator_pretrain.yaml?for example,dis_positive_data dis_negative_data and so on. thanks

luckper commented 6 years ago

@ZhenYangIACAS Hi, I see many data sample in config_discriminator_pretrain.yaml,for example: dis_positive_data,dis_negative_data,dis_dev_positive_data and so on,Can you tell me what means about this data? What data do I need to prepare if Iwant to run the code successfully? Thanks!!!!!

ZhenYangIACAS commented 6 years ago

@ @luckper dis_positive_data is the positive data for training the discriminator and dis_negative_data is the negative data for training the discriminator. dis_dev_positive_data is the development data for training discriminator, and so on...For understanding these files, I suggest that you should scan gan_train.py. Some files are what you should prepare beforehand, and some other files are generated automatically. I notice that so many files are a little messy for the users. We will re-construct our codes if we have free time.

luckper commented 6 years ago

@ZhenYangIACAS OK,pass your suggestion,I scan gan_train.py . However, I still have some questions. First, where dis_dev_positive_data , dis_dev_negative_data , dis_dev_source_data come from? And what is different from dis_positive_data,dis_negative_data,dis_source_data? Thanks!!!!!

ZhenYangIACAS commented 6 years ago

@luckper I think it is easy to get the development data sets. We just randomly sampled 200 sentences from the dis_positive_data to get the dis_dev_positive_data, and similarly, we get the corresponding dis_negative_data and dis_source_data.

luckper commented 6 years ago

@ZhenYangIACAS Hi, I run the generate_sample.sh, but some errors have occurred: Instructions for updating: Use argmax instead using rmsprop for g_loss Traceback (most recent call last): File "generate_samples.py", line 60, in generate_samples(config) File "generate_samples.py", line 32, in generate_samples optimizer=config.train.optimizer) File "/home/xxx/Downloads/ZKY-GAN_NMT/NMT_GAN-master/model.py", line 119, in build_generate optimizer=tf.train.RMSPropOptimizer(self.config.generator.learning_rate) File "/home/xxx/Downloads/ZKY-GAN_NMT/NMT_GAN-master/utils.py", line 19, in getattr if type(self[item]) is dict: KeyError: 'generator' What is the reason? and the log file record as follows: Instructions for updating: Use argmax instead INFO:root:using rmsprop for g_loss

alwaysprep commented 5 years ago

@ZhenYangIACAS @luckper have you solved "KeyError: 'generator'" error.