PetrochukM / PyTorch-NLP

Basic Utilities for PyTorch Natural Language Processing (NLP)
https://pytorchnlp.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.21k stars 258 forks source link

Get vectors from token list #23

Closed floscha closed 6 years ago

floscha commented 6 years ago

This PR extends the __getitem()__ method of the _PretrainedWordVectors base class for list and tuple types in order to retrieve word vectors for multiple tokens at once.

It also adds unit tests to assert that retrieval for mutiple tokens works correctly and an exception is thrown for invalid types.

For example, the following code will return a 4x300 torch.FloatTensor object:

from torchnlp.word_to_vector import FastText
vectors = FastText()
tokenized_sentence = ['this', 'is', 'a', 'sentence']
vectors[tokenized_sentence]
codecov-io commented 6 years ago

Codecov Report

Merging #23 into master will increase coverage by 0.02%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #23      +/-   ##
==========================================
+ Coverage   94.49%   94.52%   +0.02%     
==========================================
  Files          54       54              
  Lines        1508     1516       +8     
==========================================
+ Hits         1425     1433       +8     
  Misses         83       83
Impacted Files Coverage Δ
torchnlp/word_to_vector/pretrained_word_vectors.py 79.06% <100%> (+2.14%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 049f534...dfa987a. Read the comment docs.

PetrochukM commented 6 years ago

This looks great! Thank you!