OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.28k stars 287 forks source link

How to extract the logits from using `forward_batch` in cpu? #1386

Closed arunpatro closed 1 year ago

arunpatro commented 1 year ago

When I run this:

import transformers
import ctranslate2

model_name = "togethercomputer/RedPajama-INCITE-Chat-3B-v1"

prompt = "Hey what's up how's it going? In this"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokens = tokenizer(prompt, return_tensors="pt", padding=True)['input_ids']
gtokens = tokenizer.convert_ids_to_tokens(tokens.squeeze())

generator = ctranslate2.Generator("redpj", device="cpu")
logits = generator.forward_batch([gtokens])
print(logits)

Output:

9.00977 0.930864 16.2493 ... 2.67632 2.32075 2.38607
[cpu:0 float32 storage viewed as 1x11x50432]

I am unable to extract the logits:

import torch
torch.as_tensor(logits, device='cpu')

Gives:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[15], line 2
      1 import torch
----> 2 torch.as_tensor(logits, device='cpu')

RuntimeError: Could not infer dtype of ctranslate2._ext.StorageView
arunpatro commented 1 year ago

This works if the device = 'cuda' Isn't that weird?

guillaumekln commented 1 year ago

PyTorch does not implement the array interface for CPU arrays.

You need to do a round trip to Numpy:

logits = np.array(logits)
logits = torch.as_tensor(logits)