facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.18k stars 544 forks source link

GAN from MUSE not working on small 2D toy dataset #132

Open franciscovargas opened 5 years ago

franciscovargas commented 5 years ago

I have generated a 2D toy dataset in the following way (to sanity check the method on the alignment task its designed to do):

z ~ N(0,I) x = Az

This generates a dataset of pairs (xi, zi) = (Axi , zi)

here is the script for generating it and parsing it into vec files and dictionary file that are used as input into unsupervised.py https://github.com/franciscovargas/MUSE/blob/patch-1/simulation.py.

After generating the dataset I have ran unsupervised.py on the generated data (commenting the the self.orthogonalise() line in train.py since A need not be orthogonal , have tried with orthongal A too):

python -m ipdb unsupervised.py --src_lang f1 --tgt_lang f2 --src_emb data/z_vecs.csv  --tgt_emb data/x_vecs.csv --n_refinement 0 --cuda False --export '' --map_id_init False --n_epochs 5 --normalize_embeddings center  --emb_dim 2 --dis_most_frequent 0 --dis_hid_dim 12 --seed 123 

The method results in 0% precision at all values of k. As a safety check I ran supervised.py on the toy data with linear regression and it achieves 89%. I have tested that commenting self.orthogonalise() still yields good (and competitive results) in the of the shelf tasks and embeddings in the paper for MUSE. This simple sanity toy check should work it tests the assumptions and motivations behind the idea, something seems wrong. It has nothing to do with the dimensionality of the data being small I have repeated the same experiment generating 300D toy data.

ZQSIAT commented 1 year ago

Hi~ How is it going, Are you find the problem?