Moves sort_by_padding from the Dataset to the DataGenerator
Adds tests for : Vocabulary, Dataset, all concrete Fields, Embeddings, TokenIndexers, DatasetReaders(The reader ones could be more comprehensive, probably).
Roughly fixes the DataGenerator to work with the new Dataset class; I haven't fixed the tests for this, because MattG wrote them and they rely on some FakeInstances, which I'm not exactly sure how to translate to the new api.
Adds getter methods to the Fields, Instance and Dataset classes, called things like fields() for the Instance class, and tokens() for the TextField etc. I think it's important to be able to interact with the different classes so that writing functions for mapping predictions back to text etc is easy.
This looks great to me, from my somewhat quick look 👍 . I didn't see any big API changes in this, just testing and cleanup, right? It's nice to have usage examples in the test suite, thank you for adding them!
In summary this PR does the following few things:
sort_by_padding
from theDataset
to theDataGenerator
Vocabulary
,Dataset
, all concreteFields
,Embeddings
,TokenIndexers
,DatasetReaders
(The reader ones could be more comprehensive, probably).DataGenerator
to work with the newDataset
class; I haven't fixed the tests for this, because MattG wrote them and they rely on someFakeInstances
, which I'm not exactly sure how to translate to the new api.Field
s,Instance
andDataset
classes, called things likefields()
for theInstance
class, andtokens()
for theTextField
etc. I think it's important to be able to interact with the different classes so that writing functions for mapping predictions back to text etc is easy.