How many epochs did you use to train the models? Did you train the different models using different epochs? I read your paper. You said, the learning rate is decreased by a factor of 2 every 10 epcohs. Is this learning strategy used for every model? Or is there some difference between training different models?
How many epochs did you use to train the models? Did you train the different models using different epochs? I read your paper. You said, the learning rate is decreased by a factor of 2 every 10 epcohs. Is this learning strategy used for every model? Or is there some difference between training different models?