Provide option to load all pre-trained embeddings.

Currently, when you provide Saber a file of pre-trained embeddings, only embeddings for words that appear in the training dataset are loaded into memory. This is fine for evaluation, but hurts performance in two cases:

Transfer learning: The embeddings are only loaded for words in the source dataset. This leads to less coverage for the target dataset.
Deployment: When deploying a trained model for inference, it would be better if all pre-trained embeddings are loaded, minimizing the number of out-of-vocabulary tokens we have to perform inference on.

To fix this problem:

[x] Add load_all_embeddings argument to config.py. Make sure it is added to all other files it needs to appear in.
[x] In embeddings.py, all user to pass load_all flag in order to load all the embeddings.
[x] Update all relevant unit tests.
[x] In saber.py, figure out how to update each datasets type_to_idx mappings.
[ ] Update all documentation.

BaderLab / saber

Provide option to load all pre-trained embeddings. #92