Minimal prediction code

rainjacket commented 6 years ago

Not sure if this belongs here, but what are the minimal requirements needed to evaluate a pre-trained model, if I just need to extract text embeddings (to try out transfer learning tasks)?

rainjacket commented 6 years ago

Can the code be run without CUDA?

setup.py seems to not work properly without cuda.

rainjacket commented 6 years ago

I'm working in a CPU only Docker image.

I was able to run setup.py with "CUDA_HOME=/ python setup.py install" But when calling generate.py, I got an error in generate.py for "sd = torch.load(f)" I was able to get past this error by changing that line to "sd = torch.load(f, map_location={'cuda:0': 'cpu'})" But then I see the following error: RuntimeError: Error(s) in loading state_dict for stackedRNN: Missing key(s) in state_dict: "rnns.0.w_ih", "rnns.0.w_hh", "rnns.0.w_mih", "rnns.0.w_mhh". Unexpected key(s) in state_dict: "rnns.0.w_ih_g", "rnns.0.w_ih_v", "rnns.0.w_hh_g", "rnns.0.w_hh_v", "rnns.0.w_mih_g", "rnns.0.w_mih_v", "rnns.0.w_mhh_g", "rnns.0.w_mhh_v".

Is this an error caused by the CUDA thing or is it because my model was trained with an earlier version of the repo? (I started training this model a few weeks ago.)

raulpuric commented 6 years ago

By trying for transfer learning I'm assuming you want to run this on your own transfer task. Could you try something similar to this try/except from transfer.py https://github.com/NVIDIA/sentiment-discovery/blob/master/transfer.py#L88 ?

The reason you're getting this error is because you're trying to load a model with weight norm into a model without weight norm.

This try/except will call weight norm on the model, load the weight norm weights, and then strip weight norm from the model.

raulpuric commented 6 years ago

If you're trying to extract sentence embeddings I would look at the transform function in transfer.py for an example too.

rainjacket commented 6 years ago

Oh, my mistake, that exception was already being caught, the real exception is during the except block:

Traceback (most recent call last): File "generate.py", line 94, in <module> apply_weight_norm(model.rnn) File "/opt/conda/lib/python3.6/site-packages/apex-0.1-py3.6.egg/apex/reparameterization/__init__.py", line 48, in apply_weight_norm File "/opt/conda/lib/python3.6/site-packages/apex-0.1-py3.6.egg/apex/reparameterization/__init__.py", line 93, in apply_reparameterization File "/opt/conda/lib/python3.6/site-packages/apex-0.1-py3.6.egg/apex/reparameterization/__init__.py", line 89, in apply_reparameterization File "/opt/conda/lib/python3.6/site-packages/apex-0.1-py3.6.egg/apex/reparameterization/reparameterization.py", line 82, in apply File "/opt/conda/lib/python3.6/site-packages/apex-0.1-py3.6.egg/apex/reparameterization/weight_norm.py", line 80, in reparameterize File "/opt/conda/lib/python3.6/site-packages/apex-0.1-py3.6.egg/apex/reparameterization/weight_norm.py", line 14, in _norm RuntimeError: view is not implemented for type torch.HalfTensor

raulpuric commented 6 years ago

Ahhh ok so pytorch's support for cpu fp16 isn't very good. Could you convert your parameters and run this in fp32 instead. For fp16 you really should be using a gpu.

rainjacket commented 6 years ago

What's the method to convert the model from fp16 to fp32?

raulpuric commented 6 years ago

try

model.float()
try:
    model.load_state_dict(sd)
except:
    # if state dict has weight normalized parameters apply and remove weight norm to model while loading sd
    apply_weight_norm(model.rnn)
    model.load_state_dict(sd)
    remove_weight_norm(model)
#if you realllyyy want to run in fp16 on cpu for some reason
model.half()

If your state dict is in fp16 I think that this should still be fine and it will be converted to fp32 when copying the state dict into the parameters. If not let me know and I can suggest how to force the state dict into fp32 as well.

rainjacket commented 6 years ago

Actually, on further testing, it seems to work simply not setting --fp16 and making no other changes.

Is this expected? I feel like before I remember the models were not interchangeable like this, but I could be misremembering.

raulpuric commented 6 years ago

I think in very early versions of this codebase the models were not interchangeable, but I think >= Pytorch v0.3 I noticed that they were interchangeable (because of the auto conversion happening in load_state_dict) and I've been using them interchangeably since.

rainjacket commented 6 years ago

@raulpuric sorry for posting on a closed issue, but just wondering:

How easy would it be to transfer the learned weights to be used with the OpenAI repo https://github.com/openai/generating-reviews-discovering-sentiment which is a bit more minimal for feature extraction purposes (and also tensorflow is a little easier for me to use than pytorch)?

EDIT: even if it's easy to load the models with https://github.com/guillitte/pytorch-sentiment-neuron it would be a bit more convenient

rainjacket commented 6 years ago

I ended up writing a simple code snippet to get individual embeddings for my trained model (using this repo):

import torch
import torch.nn as nn
from torch.autograd import Variable

from apex.reparameterization import apply_weight_norm, remove_weight_norm
from apex.RNN.cells import mLSTMRNNCell
from apex.RNN.RNNBackend import stackedRNN

class RNNFeaturizer(nn.Module):
    def __init__(self):
        super(RNNFeaturizer, self).__init__()
        self.encoder = nn.Embedding(256, 64)
        self.rnn = stackedRNN(mLSTMRNNCell(64, 4096, bias=True), 1)

    def forward(self, input, seq_len=None):
        self.rnn.detach_hidden()
        cell = 0
        for i in range(input.size(0)):
            emb = self.encoder(input[i])
            _, hidden = self.rnn(emb.unsqueeze(0), collectHidden=True)
            cell = hidden[1][-1][-1]
        return cell

    def load_state_dict(self, state_dict, strict=True):
        self.encoder.load_state_dict(state_dict['encoder'], strict=strict)
        self.rnn.load_state_dict(state_dict['rnn'], strict=strict)

def load_model(path='/indocker/lang_model.pt'):
    model = RNNFeaturizer()
    model.cuda()
    model.half()
    with open(path, 'rb') as f:
        state_dict = torch.load(f)['encoder']

    apply_weight_norm(model.rnn)
    model.load_state_dict(state_dict)
    remove_weight_norm(model)
    return model

model = load_model()

def encode(text):
    text = '\n ' + text
    text = text.encode('ascii', 'ignore')
    tensor = torch.cuda.ByteTensor(len(text))
    for i, char in enumerate(text):
        tensor[i] = char
    return Variable(tensor).long().unsqueeze(1)

def transform(text):
    var = encode(text)
    model.eval()
    with torch.no_grad():
        model.rnn.reset_hidden(1)
        cell = model(var, var.size()).float()
        return cell.data.cpu().numpy()[0]

NVIDIA / sentiment-discovery

Minimal prediction code #34