slow (no?) convergence with `useWeight` option

amn41 commented 5 years ago

I don't suspect this has anything to do with the library per se (everything seems to work), but wanted to post an issue in case anyone had seen something similar.

I'm working a text classification task, and want to compare vanilla starspace with starspace using embeddings from a different task.

I'm running with

starspace train \
  -trainFile "${DATADIR}"/train.txt \
  -model "${MODELDIR}"/transfer \
  -initRandSd 0.01 \
  -useWeight true \
  -adagrad false \
  -lr 0.005 \
  -epoch 100 \
  -dim 20 \
  -thread 20 \
  -batchSize 5 \
  -negSearchLimit 15 \
  -maxNegSamples 15 \
  -similarity "dot" \
  -verbose true

where my training data already has 20-dimensional embeddings:

d0:-0.00775984 d1:0.0233467 d2:-0.0162142 d3:-0.043273 d4:0.0375114 d5:-0.0354039 d6:0.108861 d7:0.191104 d8:0.0758502 d9:-0.0222879 d10:-0.139912 d11:0.13165 d12:-0.0817156 d13:-0.129767 d14:-0.137289 d15:0.0672633 d16:0.0806599 d17:-0.0522917 d18:0.0929382 d19:0.0176532 __label__class1

Now for some reason SGD just doesn't make any progress (loss oscillates around initial value). Already tried a few obvious things like tweaking learning rate and batch size, but curious if anyone has seen something similar.

Using the text as my LHS (as opposed to the real-valued features) all works well.

ledw commented 5 years ago

@amn41 Hi, thanks for reporting. What is the performance on the test set for starspace with/without using features? Does the performance on the classification task change?

amn41 commented 5 years ago

Without using external features I get 0.55 hits@1. Using features, after 1 epoch I get 0.4 hits@1 and it gets worse from there (settling at about 0.3-0.35)

ledw commented 5 years ago

@amn41 thanks. It could be the case that the features you used (i.e. embeddings from a different task) are not good for this task?

amn41 commented 5 years ago

investigating now - will report back !

ledw commented 5 years ago

@amn41 any updates?

paulaWesselmann commented 5 years ago

Hi @ledw @amn41 and I did some further experiments with embeddings learned from different data this time and the model does train. Therefore we concluded what you already suggested, that the embeddings were not good for the task. Thank you for getting back to us!

ledw commented 5 years ago

@paulaWesselmann thanks for the updates. You're welcome!

facebookresearch / StarSpace

slow (no?) convergence with `useWeight` option #218