artetxem / vecmap

A framework to learn cross-lingual word embedding mappings
GNU General Public License v3.0
645 stars 130 forks source link

Is there any way to do the transformation without changing the source embedding space at all? #3

Closed yuchenlin closed 7 years ago

yuchenlin commented 7 years ago

Hi Mikel,

Thanks for this wonderful code and readme. It seems that the code here must normalize the source embedding space first. ( I know this is the basic and a major novelty of the paper.) I was wondering is there any way to do the transformation without changing the source embedding space? (like the original Mikolov 's paper(2013) or CCA does -- the cca github repo has a shell for this situation instead of creating a third space)

It is quite common that we have already trained a model using the original source embedding space, and if the transformation need to change the source embedding space then we have to retrain the models, which could cost a lot of time.

Thus, I am wondering if there is a way to "recover" the mapped target embedding (the output of this repo) to the original, un-normalized source embedding space? It sounds like a de-normalization process.

My naive thought is to build a transformation again from the normalized to the un-normalized space. Maybe you could have some better suggestions?

Thanks and regards, Bill

artetxem commented 7 years ago

I guess that the proper way of "de-normalizing" would be to apply the reverse of the normalization operation in question. For instance, mean centering subtracts the mean vector to every embedding, so you should add the mean vector back if you want to undo it.

That being said, I am not sure about how much sense this makes. It might be that mapping the original embeddings without any normalization at all gives better results, or you could also try learning the mapping over the normalized embeddings and then applying it to the original ones.

If I were you, I would try these different alternatives and see which of them works best for you.