artetxem / vecmap

A framework to learn cross-lingual word embedding mappings
GNU General Public License v3.0
642 stars 130 forks source link

How to evaluate orthogonally mapped vectors? #6

Open Abishek1997 opened 6 years ago

Abishek1997 commented 6 years ago

Hello. I have already posted an issue to get more details on how the format of the test files should be. But im not getting it anyways. Please look into the following screenshot and if possible, suggest me a solution. I have taken a tamil-sanskrit word pair as test file. But when I use this file, it shows the following message ( more than 2 values needed to unpack). Since it requires a 'value' field, I then included an arbitrary value (not sure what it is , btw! :D), and then it throws 'axis dimension is out of bounds error'. What should I do pic

artetxem commented 6 years ago

The second error is because you indeed need a third column in the dataset with the similarity scores. You are not showing the full error message and command in the first error, so not sure about what is going on there, but it looks like some issue with the length normalization of the embeddings (which looks weird to me).

In any case, I insist that, if you are a complete beginner, creating a similarity dataset from scratch does not look like the best idea. I would recommend you to evaluate your embeddings in translation instead, which should be easier and more reliable.

I would also recommend you to have a look at the provided datasets and run some experiments with them. That would help you understand how everything works, so it should be easier to switch to your own dataset.