facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.18k stars 552 forks source link

Monolingual source word similarity score average: nan #82

Open gzhcv opened 5 years ago

gzhcv commented 5 years ago

I tried to align monolingual word embeddings use command:

python unsupervised.py --src_lang en --tgt_lang zh --src_emb data/all.en.vec --tgt_emb data/all.zh.vec --n_refinement 5 --normalize_embeddings center

After adversarial traning eoch 0 finished, info shows below.

INFO - 10/14/18 21:24:06 - 0:13:31 - 996000 - Discriminator loss: 0.3409 - 2506 samples/s
INFO - 10/14/18 21:24:09 - 0:13:34 - ====================================================================
INFO - 10/14/18 21:24:09 - 0:13:34 -                        Dataset      Found     Not found          Rho
INFO - 10/14/18 21:24:09 - 0:13:34 - ===================================================================
anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
anaconda3/lib/python3.6/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
INFO - 10/14/18 21:24:09 - 0:13:34 - Monolingual source word similarity score average: nan
INFO - 10/14/18 21:24:10 - 0:13:35 - Building the train dictionary ...
INFO - 10/14/18 21:24:10 - 0:13:35 - New train dictionary of 4255 pairs.
INFO - 10/14/18 21:24:10 - 0:13:35 - Mean cosine (nn method, S2T build, 10000 max size): 0.33474

Any help?

glample commented 5 years ago

This is normal, the NaN here is because there was no data available in Chinese for the monolingual word similarity task, but if you are not interested in this task you can simply ignore it. The model should continue to run normally.

gzhcv commented 5 years ago

Thanks a lot. But the program stopped after this, i don't know why. Complete info as follows:

INFO - 10/14/18 21:24:06 - 0:13:31 - 996000 - Discriminator loss: 0.3409 - 2506 samples/s
INFO - 10/14/18 21:24:09 - 0:13:34 - ====================================================================
INFO - 10/14/18 21:24:09 - 0:13:34 -                        Dataset      Found     Not found          Rho
INFO - 10/14/18 21:24:09 - 0:13:34 - ===================================================================
anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
anaconda3/lib/python3.6/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
INFO - 10/14/18 21:24:09 - 0:13:34 - Monolingual source word similarity score average: nan
INFO - 10/14/18 21:24:10 - 0:13:35 - Building the train dictionary ...
INFO - 10/14/18 21:24:10 - 0:13:35 - New train dictionary of 4255 pairs.
INFO - 10/14/18 21:24:10 - 0:13:35 - Mean cosine (nn method, S2T build, 10000 max size): 0.33474
Segmentation fault (core dumped)