facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.18k stars 552 forks source link

How to keep the upper-case words? #59

Closed bdqnghi closed 6 years ago

bdqnghi commented 6 years ago

After the training steps, all of the words are in lower case, how can i keep them as the original version?

glample commented 6 years ago

The words are lowercased so that they can be evaluated, but the export function will take care of uppercased and lowercased words. If you see this: https://github.com/facebookresearch/MUSE/blob/master/src/utils.py#L264 when the full vocabulary is loaded (at the experiment, when exporting aligned embeddings), then the words casing is not changed.