After generating the dataset I have ran unsupervised.py on the generated data (commenting the the self.orthogonalise() line in train.py since A need not be orthogonal , have tried with orthongal A too):
The method results in 0% precision at all values of k. As a safety check I ran supervised.py on the toy data with linear regression and it achieves 89%. I have tested that commenting self.orthogonalise() still yields good (and competitive results) in the of the shelf tasks and embeddings in the paper for MUSE. This simple sanity toy check should work it tests the assumptions and motivations behind the idea, something seems wrong. It has nothing to do with the dimensionality of the data being small I have repeated the same experiment generating 300D toy data.
I have generated a 2D toy dataset in the following way (to sanity check the method on the alignment task its designed to do):
z ~ N(0,I) x = Az
This generates a dataset of pairs (xi, zi) = (Axi , zi)
here is the script for generating it and parsing it into vec files and dictionary file that are used as input into
unsupervised.py
https://github.com/franciscovargas/MUSE/blob/patch-1/simulation.py.After generating the dataset I have ran unsupervised.py on the generated data (commenting the the self.orthogonalise() line in train.py since A need not be orthogonal , have tried with orthongal A too):
The method results in 0% precision at all values of k. As a safety check I ran supervised.py on the toy data with linear regression and it achieves 89%. I have tested that commenting self.orthogonalise() still yields good (and competitive results) in the of the shelf tasks and embeddings in the paper for MUSE. This simple sanity toy check should work it tests the assumptions and motivations behind the idea, something seems wrong. It has nothing to do with the dimensionality of the data being small I have repeated the same experiment generating 300D toy data.