Closed khld closed 6 years ago
Can i stop the program from running even at epoch 2 ?and my training will be lost or no?and what's the perfect number of epoch i should train the algorithm to have a good chatbot?
@khld In order to train this seq2seq model using the full Papaya Data Set, you'd better have a machine with good GPUs (it normally takes me about 12 hours to train for 60 epochs, using two pretty good GPUs). If you have a good PC with only CPU(s), it normally take 20 to 30 times longer, which means it will take at least 3 hours for one epoch. The implementation starts to save the trained model only when it reaches certain perplexity, say 1.6. Only to this level, the chatbot will be meaningful. Your training will be lost when you stop that.
i have a gtx 1060 6gb version , i changed the batch_size , to fit the gpu's memory . the epochs seem to take around 1.5hours to complete . do you think i should change any other parameters to decrease the time taken for an epoch ? also can we change the hprams with out stopping the training ? thanks
@MukundaK No, the hyper-parameters cannot be changed on the fly. With the existing hardware settings, the only suggestion I can make is to use a smaller data set. Try to exclude the reddit data (or keep only half of that), then re-generate the vocab.txt file, and restart the training.
Actually, I discarded very large portion of the reddit data I have, which was about 1M pairs. I kept only 160K pairs for this project. The biggest saving from doing that was the reduction of vocab size.
does the training continue if i place your pre-trained model in the result directory ?
@MukundaK Firstly, there is no need to do that. The model provided is a well-trained model. If you want to train it for other conversation pairs, it is very likely to make the model unbalanced. Also, the vocab.txt cannot be changed. Furthermore, unlike image-based CNN models, which are good at feature extraction, and are often suitable for transfer learning; seq2seq models, to the best of my knowledge, are good at memorizing (although with attention mechanism, and word2vec algorithm, it can learn many others) by capturing the relationships between the pairs. Therefore, the overall training data distribution is very important. It is often very sensitive to the data quality, much more than CNN models.
Secondly, I am not aware of any way of doing that. If you figure out a solution, please share with us.
I started the running the algorithme 4 day ago , and it still till now. My Pc is 12G RAM and the CPU is i5 How many epoch takes to finish!?