Baseline LSTM implementation

achyudh commented 5 years ago

Baseline LSTM

Implementation of a standard LSTM using PyTorch and Torchtext for text classification baseline measurements

Model Type

rand: All words are randomly initialized and then modified during training.
- static: A model with pre-trained vectors from word2vec. All words -- including the unknown ones that are initialized with zero -- are kept static and only the other parameters of the model are learned.
- non-static: Same as above but the pretrained vectors are fine-tuned for each task.

Quick Start

To run the model on Reuters dataset on static, just run the following from the Castor working directory.

python -m lstm_baseline --mode static

Dataset

We experiment the model on the following datasets.

Reuters dataset - ModApte splits

Settings

Adadelta is used for training.

TODO

Add dataset results with different hyperparameters
- Parameters tuning

achyudh commented 5 years ago

@Impavidity can you please review this pull request?

Impavidity commented 5 years ago

One more suggestion, basically for the implementation of RNN model, we should provide length and use rnn_utils.pack_padded_sequence. This is a more reasonable implementation (do not means this will get better results). If you have extra time, you could compare these two implementations.

A reference here https://github.com/castorini/Castor/blob/master/conv_rnn/model.py#L55

achyudh commented 5 years ago

@Impavidity thanks for the tip. I added padding for packed sequences in #152 along with TensorboardX support. I didn't want to crowd this PR so made a new one. Also, I'll continue to push commits there so there is no need for a review right now.

castorini / castor