Closed jwijffels closed 5 years ago
I noticed that the batch size also affects the loss. Smaller batches (5-16) converge faster whereas larger batches (300) were just oscillating between a range and didn't converge at all (dim = 20).
@tharangni, the question is not about the speed of convergence but about the size of the loss
@jwijffels ahh i should have rephrased my sentence better. apologies. what i meant was that batch sizes affected the magnitude of loss. i.e. larger batch sizes = high loss.
@jwijffels Hi, sorry for the delay in responding. I would recommend that you compare the hyper-parameters on the evaluation metric of a validation set rather than the loss on the validation set. For instance, in the example of fb15k, you can test different hyper-parameters by optimizing the hit@10 metric on validation dataset.
Ok, thanks for the feedback.
Hi @ledw I have a general question on approaches you advise to tune the hyperparameters of the Starspace models. It would be nice if you can give some advise on what would be your general approach in tweaking hyperparameters in order to get an optimal model based on looking at the loss mainly. I generally need to tweak in order to get a good model
Some hyperparameters which do influence the general range of the loss
Some hyperparameters not affecting the general range of the loss
My general approach is to give some sensible settings which work and to look at the evolution of the loss over the epochs to see that it learns something (on validation data loss steadily decreases - example of such a graph below) and then manually inspect some embedding similarities between labels and terms in the model to see if the embeddings really make sense. But then if I want to compare across different settings of hyperparameters enumerated in point 1, it is hard as they change the range of the loss.
I would like to know your general approach how to get the best setting of loss/margin/similarity metric/negSearchLimit given that if you change these parameters, the range of the loss also changes? Many thanks for any input.