Clarification on files in config_discriminator.yaml

kellymarchisio commented 6 years ago

I'm hoping for clarification on the files passed into config_discriminator.yaml.

As I understand it:

dis_positive_data and dis_source_data are 1 million random lines from the original target (o_dst) and source training files (o_src), manually padded with to length 50 (I wrote my own script to do this). These files are created using the same files used in generator pretraining. Must we pad the sentences ourselves as I have?
dis_negative_data (negative_bpe_u8.txt) is the output from sh generate_sample.sh predicted using o_src
dis_src_vocab and dis_dst_vocab are BPE vocabulary lists corresponding to the vocabulary of all BPEs used in o_dst and o_src. The vocabulary lists are created by running o_dst and o_src through vocab.py
- alternatively, since I am running the en-de translation, should I expect to use vocab.bpe.32000 for src_vocab and dst_vocab instead of separate vocab files with 38831/34432 words, as in the demo config?

Is my understanding correct? Can you please provide clarification on how the files in the discriminator pretraining are created?

For context, I am trying to solve an issue where when running the discriminator pretraining, my loss falls to 2-3, but the accuracy always oscillates around 0.5 - even after 700K steps.

ZhenYangIACAS commented 6 years ago

@kellymarchisio Firstly, Thanks for your contributions to this code. Actually, your understanding is right. However, the vocabulary file is a little different from vocab.bpe.32000 released by en-de wmt corpora in the artificial tokens, such as "PAD", "S", "S" and "UNK". These tokens are utilized for preparing the training data. You only need to add these four tokens manually at the begin of the vocab.bpe.32000. I checked the log file in our experiments again, I find that in our experiments, the discriminator achieves accuracy with 0.7 after 2-epochs training.

jeicy07 commented 6 years ago

In my project, when running the discriminator pretraining, my loss also falls to 0-2, but the accuracy always oscillates around 0.5. I wonder what kind of problems may cause that? Thanks.

kellymarchisio commented 6 years ago

@jeicy07 In my project, the silly reason this behaviour was caused was because my pickled dictionary was built incorrectly. I fixed it to make sure it was a string:int mapping of word:id. When this was broken, my entire src/trg/neg matrices were written as 1s (for UNK). It sounds like the behaviour you observe is symptomatic of indistinguishable matrices. Try logging the final matrices you feed into the discriminator, to see if you observe anything unusual. Then backtrack from there.

jeicy07 commented 6 years ago

Thanks, I've tried to convert my pickled dictionary into a dict. Finally, it works!

kellymarchisio commented 6 years ago

@ZhenYangIACAS Thanks very much for your response. After fixing some errors, I also achieve accuracy 0.70 after 2 epochs. After how many epochs do you reach 0.82/0.95? I am still training (~epoch 4) but performance is still ~0.70.

I notice though that the accuracy bounces around quite significantly as seen here:

testing the accuracy on the evaluation sets when epoch 1, samples 3400000
the total accuracy in evaluation is 0.714286
testing the accuracy on the evaluation sets when epoch 1, samples 3410000
the total accuracy in evaluation is 0.708494
testing the accuracy on the evaluation sets when epoch 1, samples 3420000
the total accuracy in evaluation is 0.722008
testing the accuracy on the evaluation sets when epoch 1, samples 3430000
the total accuracy in evaluation is 0.735521
testing the accuracy on the evaluation sets when epoch 1, samples 3440000
the total accuracy in evaluation is 0.700772

Is this expected, or a bug?

I also notice that loss alternates between very high values and lower values at the beginning of training:

epoch 0, samples 100, loss 8.257384, accuracy 0.510000 BatchTime 21.912569
epoch 0, samples 200, loss 109.485771, accuracy 0.500000 BatchTime 1.326053
epoch 0, samples 300, loss 26.595387, accuracy 0.500000 BatchTime 1.261572
epoch 0, samples 400, loss 109.342659, accuracy 0.500000 BatchTime 1.244808
epoch 0, samples 500, loss 24.455862, accuracy 0.500000 BatchTime 1.179417
epoch 0, samples 600, loss 101.686081, accuracy 0.500000 BatchTime 1.192621

Is this also expected, and what might cause this behavior? I would expect loss to monotonically decrease.

Thanks very much for releasing this code base - I've enjoyed working with it.

ZhenYangIACAS commented 6 years ago

@ @kellymarchisio I am sorry for late response. Your loss is so strange that it varies significantly ranging from the upper bound and lower bound. In our experiments, the loss should decrease smoothly. Have you shuffled your training data?

kellymarchisio commented 6 years ago

@ZhenYangIACAS Yes, the training data is being shuffled. Now on Epoch 5, the model has begun to overfit. The peak was ~0.71-0.72 in earlier epochs. This config I'm using is below. Does anything look amiss here?

src_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/generate_data/vocab.bpe.32000.e' dst_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/generate_data/vocab.bpe.32000.e' src_vocab_size: 32000 dst_vocab_size: 32000 hidden_units: 512 scale_embedding: True attention_dropout_rate: 0.0 residual_dropout_rate: 0.1 num_blocks: 6 num_heads: 8 binding_embedding: False train: logdir: '/local/scratch/kvm23/angec_final/yang-gan/experience/ende-4.5mil-test/dis_pretrain/4' dis_src_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/vocab.bpe.32000.e.pkl' dis_dst_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/vocab.bpe.32000.e.pkl' dis_max_epoches: 10 dis_dispFreq: 1 dis_saveFreq: 100 dis_devFreq: 100 dis_batch_size: 100 dis_saveto: '/local/scratch/kvm23/angec_final/yang-gan/models/ende-4.5mil-test/4/disc_pretrain' dis_reshuffle: True dis_gpu_device: 'gpu-0' dis_max_len: 50 dis_positive_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.de.shuf.1mil-chop60' dis_negative_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/negative_predictions.txt' dis_source_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.en.shuf.1mil-chop60' dis_dev_positive_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.de.shuf.300.dev' dis_dev_negative_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/negative_predictions.dev.txt' dis_dev_source_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.en.shuf.300.dev' dis_dev_log: '/local/scratch/kvm23/angec_final/yang-gan/experience/ende-4.5mil-test/dis_pretrain/4/dev_log-trial2' dis_reload: True dis_clip_c: 1.0 dis_dim_word: 512 dis_optimizer: 'rmsprop' dis_scope: 'discnn'

The training accuracy is now 0.75-0.90 per batch, but dev accuracy stays 0.61-0.71, as it was in epoch 2, except in epoch 2 the performance was more consistent.

ZhenYangIACAS commented 6 years ago

@kellymarchisio There is no obvious error for your configuration.

kellymarchisio commented 6 years ago

@ZhenYangIACAS thanks for taking a look. To verify, should dis_positive_data, dis_negative_data, etc. look like regular sentences like:

This is a sentence .

Or do I have to pad the text file sent to the config so they look like:

<S> This is a sentence . </S> <PAD> <PAD> <PAD>...

I believe I've tried both, but your verification would be helpful.

ZhenYangIACAS commented 6 years ago

@kellymarchisio You do not need to add the padding to the files manually. The code will do it automatically.

kellymarchisio commented 6 years ago

@ZhenYangIACAS Thank you for the clarification. According to your paper, I won't be able to reproduce the GAN training unless I get 82% accuracy. A few quick related questions:

Are there any other parameters (perhaps not mentioned in the paper) necessary to get the discriminator to 82-95% performance?
Did you use pretrained word embeddings?
What initialisation did you use, and what was the tuning to get to higher accuracy?
How many epochs did it take to get to 95% accuracy?
Can you think of a reason why my loss bounces around so much at the beginning of training? (Could this signify a bug in the code?)

ZhenYangIACAS commented 6 years ago

I am sure that all of the parameters which show much effect on the translation performance are described detailed in our paper. We did not use pre-trained word embeddings. You can find the initialization method in our code. I remember that when we use the Transformer as the generator, the accuracy is hard to get to more than 90%. For your problem, it seems that a bug exists, but I am not sure.

ashwanitanwar commented 6 years ago

@kellymarchisio How did you find dev accuracy in discriminator? I am using transformer as a generator. Code for finding dev accuracy is commented in cnn_discriminator.py file. I used this code but it shows several values in validation accuracy and they vary a lot.

luckper commented 6 years ago

@kellymarchisio Hi，Can you show me your data sample in config_discriminator_pretrain.yaml？for example，dis_positive_data dis_negative_data and so on. thanks

luckper commented 6 years ago

@ZhenYangIACAS Hi, I see many data sample in config_discriminator_pretrain.yaml，for example： dis_positive_data，dis_negative_data，dis_dev_positive_data and so on，Can you tell me what means about this data？ What data do I need to prepare if Iwant to run the code successfully？ Thanks！！！！！

ZhenYangIACAS commented 6 years ago

@ @luckper dis_positive_data is the positive data for training the discriminator and dis_negative_data is the negative data for training the discriminator. dis_dev_positive_data is the development data for training discriminator, and so on...For understanding these files, I suggest that you should scan gan_train.py. Some files are what you should prepare beforehand, and some other files are generated automatically. I notice that so many files are a little messy for the users. We will re-construct our codes if we have free time.

luckper commented 6 years ago

@ZhenYangIACAS OK，pass your suggestion，I scan gan_train.py . However, I still have some questions. First, where dis_dev_positive_data , dis_dev_negative_data , dis_dev_source_data come from? And what is different from dis_positive_data,dis_negative_data,dis_source_data? Thanks!!!!!

ZhenYangIACAS commented 6 years ago

@luckper I think it is easy to get the development data sets. We just randomly sampled 200 sentences from the dis_positive_data to get the dis_dev_positive_data, and similarly, we get the corresponding dis_negative_data and dis_source_data.

luckper commented 6 years ago

@ZhenYangIACAS Hi, I run the generate_sample.sh, but some errors have occurred: Instructions for updating: Use argmax instead using rmsprop for g_loss Traceback (most recent call last): File "generate_samples.py", line 60, in generate_samples(config) File "generate_samples.py", line 32, in generate_samples optimizer=config.train.optimizer) File "/home/xxx/Downloads/ZKY-GAN_NMT/NMT_GAN-master/model.py", line 119, in build_generate optimizer=tf.train.RMSPropOptimizer(self.config.generator.learning_rate) File "/home/xxx/Downloads/ZKY-GAN_NMT/NMT_GAN-master/utils.py", line 19, in getattr if type(self[item]) is dict: KeyError: 'generator' What is the reason? and the log file record as follows: Instructions for updating: Use argmax instead INFO:root:using rmsprop for g_loss

alwaysprep commented 5 years ago

@ZhenYangIACAS @luckper have you solved "KeyError: 'generator'" error.

ZhenYangIACAS / NMT_GAN

Clarification on files in config_discriminator.yaml #11