Closed fakeProgrammer0 closed 4 years ago
It takes 5 days to complete 1 epoch on Wikipedia with 8 V100 GPUs. I believe the controllability of the models can be further increased by (1) increase the latent dimension, and (2) Training longer.
It takes 5 days to complete 1 epoch on Wikipedia with 8 V100 GPUs. I believe the controllability of the models can be further increased by (1) increase the latent dimension, and (2) Training longer.
Thanks for replying. How many epochs do you pre-train Optimus for? Specifically, what's the batch size and the number of pre-training steps?
For the results reported in the paper, I used the pre-trained model with 1 epoch and latent size 32.
Here is one example for the pre-training script: https://github.com/ChunyuanLI/Optimus/blob/master/code/scripts/scripts_philly/train_vae_wikipedia_distributed.yaml
The batch size is 16 * 8 = 128 sentences. There are nearly 2M sentences in Wikipedia.
Thanks for your reply : )
In the paper, it writes,
So how long did it take to pre-train Optimus in terms of days or weeks with its encoder and decoder initialized with weights of BERT and GPT-2 respectively?