Closed rachithaiyappa closed 1 year ago
Can you check if it works without device_map option?
scorer = lmppl.EncoderDecoderLM("/home/racball/models--flan-t5-xxl")
I'm actually aware of the issue that when you specify device_map='auto'
on a node with multiple gpus available, the perplexity calculation raises error due to the model allocation across different gpus. I'm trying to fix, but not finished yet, so I would suggest to turn off the device mapping for now.
Thanks for getting back. I actually did try it without the device map. The code doesn’t recognise the presence of GPUs, loads the model on the CPU and throws the similar error that it found Atleast two devices “cpu” and “cuda”
I will share the full error stack here soon
if it helps, I also think the ‘forward’ function is messing up stuff here, independent on your code. I mostly use model.generate for my programs and it doesn’t give an issues. But as soon as I shift to model.forward, this sort of issue arises.
That's interesting. Could it be something to do with the transformers version? I can calculate perplexity with google/flan-t5-xxl
via lmppl.EncoderDecoderLM
on my two-gpus node. The version of transformers related library follows below.
transformers 4.26.1
huggingface-hub 0.12.0
sentencepiece 0.1.97
I'm using transformers 4.26.1 huggingface-hub 0.13.1 sentencepiece 0.1.97
But I don't think the huggingface-hub should matter? I can downgrade it if you think it matters
for completeness in this discussion, here's the full error stack for running without the device map for
scorer = lmppl.EncoderDecoderLM("/home/racball/models--flan-t5-xxl")
RuntimeError Traceback (most recent call last)
Cell In[1], line 16
8 inputs = [
9 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.',
10 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.'
11 ]
12 outputs = [
13 'I am happy.',
14 'I am sad.'
15 ]
---> 16 ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs)
17 print(list(zip(outputs, ppl)))
18 # >>> [
19 # ('I am happy.', 4138.748977714201),
20 # ('I am sad.', 2991.629250051472)
21 # ]
22 # print(f"prediction: {outputs[ppl.index(min(ppl))]}")
23 # >>> "prediction: I am sad."
File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/lmppl/ppl_encoder_decoder_lm.py:158](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/lmppl/ppl_encoder_decoder_lm.py:158), in EncoderDecoderLM.get_perplexity(self, input_texts, output_texts, batch)
156 valid_length = (model_inputs["labels"] != PAD_TOKEN_LABEL_ID).sum(dim=-1)
157 output = self.model(**{k: v.to(self.device) for k, v in model_inputs.items()})
--> 158 loss = self.loss_fct(output['logits'].view(-1, self.config.vocab_size), model_inputs["labels"].view(-1))
159 loss = loss.view(len(output['logits']), -1)
160 loss = torch.sum(loss, -1) [/](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/) valid_length
File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/modules/module.py:1194](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/modules/module.py:1194), in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/modules/loss.py:1174](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/modules/loss.py:1174), in CrossEntropyLoss.forward(self, input, target)
1173 def forward(self, input: Tensor, target: Tensor) -> Tensor:
-> 1174 return F.cross_entropy(input, target, weight=self.weight,
1175 ignore_index=self.ignore_index, reduction=self.reduction,
1176 label_smoothing=self.label_smoothing)
File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/functional.py:3026](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/functional.py:3026), in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
3024 if size_average is not None or reduce is not None:
3025 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3026 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument target in method wrapper_nll_loss_forward)```
I'm using transformers 4.26.1 huggingface-hub 0.13.1 sentencepiece 0.1.97
But I don't think the huggingface-hub should matter? I can downgrade it if you think it matters
Yeah, I don't think it matters. I put it just in case, but I guess you don't have to downgrade it.
Btw, which version of lmppl are you using?
The latest version is 0.1.9, so you can try to upgrade to the latest version if it's old one. Otherwise, can you share the whole script you use, and I will try it on my env to see if I can reproduce the error.
I was using the latest one (0.1.9)
Since your last message, I uninstalled lmppl, reinstalled it in a fresh environment and even redownloaded the flan-t5-xxl model and reran the code without the device_map argument. Ran into the same error.
Btw for enhancement, my request is to allow a cache/storing directory where the model is going to be stored on the local machine, as an input to the class since these models are usually huge and the default directory to which hugging face downloads may have limited space :)
Below is
""" Caluculate decoder perpleity of encoder-decoder LM.
>>> from lmppl import EncoderDecoderLM
>>> scorer = EncoderDecoderLM('t5-small')
>>> scores = scorer.get_perplexity(
input_texts=['sentiment classification: I have a bad day'] * 2,
output_texts=['happy', 'sad'])
>>> print(scores)
[373.821367795063, 274.29454188096724]
"""
import os
import logging
from math import exp
from typing import List
from tqdm import tqdm
import torch
import transformers
# from .util import internet_connection
os.environ["OMP_NUM_THREADS"] = "1" # to turn off warning message
os.environ["TOKENIZERS_PARALLELISM"] = "false" # to turn off warning message
PAD_TOKEN_LABEL_ID = torch.nn.CrossEntropyLoss().ignore_index
def get_lm(model_name: str,
use_auth_token: bool = False,
torch_dtype=None,
device_map: str = None,
low_cpu_mem_usage: bool = False):
""" get encoder-decoder lms from huggingface """
# tokenizer
local_files_only = not internet_connection()
tokenizer = transformers.AutoTokenizer.from_pretrained(
model_name, local_files_only=local_files_only, use_auth_token=use_auth_token,cache_dir = "/home/racball/flan", force_download = True)
# config
config = transformers.AutoConfig.from_pretrained(
model_name, local_files_only=local_files_only, use_auth_token=use_auth_token,cache_dir = "/home/racball/flan", force_download = True)
# model
if config.model_type == 't5': # T5 model requires T5ForConditionalGeneration class
model_class = transformers.T5ForConditionalGeneration.from_pretrained
elif config.model_type == 'mt5':
model_class = transformers.MT5ForConditionalGeneration.from_pretrained
elif config.model_type == 'bart':
model_class = transformers.BartForConditionalGeneration.from_pretrained
elif config.model_type == 'mbart':
model_class = transformers.MBartForConditionalGeneration.from_pretrained
elif config.model_type == 'switch_transformers':
model_class = transformers.SwitchTransformersForConditionalGeneration.from_pretrained
else:
raise ValueError(f'unsupported model type: {config.model_type}')
param = {'config': config, "local_files_only": local_files_only, "use_auth_token": use_auth_token, "low_cpu_mem_usage": low_cpu_mem_usage}
if torch_dtype is not None:
param['torch_dtype'] = torch_dtype
if device_map is not None:
param['device_map'] = device_map
model = model_class(model_name, cache_dir = "/home/racball/flan", force_download = True, **param)
if model.config.decoder_start_token_id is None:
model.config.decoder_start_token_id = tokenizer.pad_token_id
return tokenizer, model, config
class EncoderDecoderLM:
""" Encoder-Decoder Language Model """
def __init__(self,
model: str = 't5-small',
use_auth_token: bool = False,
max_length_encoder: int = None,
max_length_decoder: int = None,
num_gpus: int = None,
torch_dtype=None,
device_map: str = None,
low_cpu_mem_usage: bool = False):
""" Encoder-Decoder Language Model.
@param model: Model alias or path to local model file.
@param use_auth_token: Huggingface transformers argument of `use_auth_token`
@param device: Device name to load the models.
@param num_gpus: Number of gpus to be used.
"""
logging.info(f'Loading Model: `{model}`')
# load model
self.tokenizer, self.model, self.config = get_lm(
model, use_auth_token=use_auth_token, torch_dtype=torch_dtype, device_map=device_map, low_cpu_mem_usage=low_cpu_mem_usage)
if self.tokenizer.pad_token is None:
self.tokenizer.pad_token = "<<PAD>>"
if max_length_encoder is None:
self.max_length_encoder = None
else:
self.max_length_encoder = max_length_encoder if max_length_encoder is not None else self.tokenizer.model_max_length
assert self.max_length_encoder <= self.tokenizer.model_max_length, f"{self.max_length_encoder} > {self.tokenizer.model_max_length}"
if max_length_decoder is None:
self.max_length_decoder = None
else:
self.max_length_decoder = max_length_decoder if max_length_decoder is not None else self.tokenizer.model_max_length
assert self.max_length_decoder <= self.tokenizer.model_max_length, f"{self.max_length_decoder} > {self.tokenizer.model_max_length}"
# loss function
self.loss_fct = torch.nn.CrossEntropyLoss(reduction='none')
# GPU setup
self.device = self.model.device
if device_map is None:
num_gpus = torch.cuda.device_count() if num_gpus is None else num_gpus
if num_gpus > 0:
self.model = torch.nn.DataParallel(self.model)
self.model.to('cuda')
self.model.eval()
logging.info(f'\t * model is loaded on: {self.device}')
def get_perplexity(self, input_texts: str or List, output_texts: str or List, batch: int = None):
""" Compute the perplexity on decoder of the seq2seq model.
:param input_texts: A string or list of input texts for the encoder.
:param output_texts: A string or list of output texts for the decoder.
:param batch: Batch size
:return: A value or list of perplexity.
"""
assert type(input_texts) is type(output_texts), f"{type(input_texts)} != {type(output_texts)}"
# batch preparation
single_input = type(input_texts) == str
input_texts = [input_texts] if single_input else input_texts
output_texts = [output_texts] if single_input else output_texts
assert len(input_texts) == len(output_texts), f"{len(input_texts)} == {len(output_texts)}"
batch = len(output_texts) if batch is None else batch
batch_id = list(range(0, len(input_texts), batch)) + [len(output_texts)]
batch_id = list(zip(batch_id[:-1], batch_id[1:]))
loss_list = []
with torch.no_grad():
for s, e in tqdm(batch_id):
# input feature
if self.max_length_encoder is not None:
model_inputs = self.tokenizer(
input_texts[s:e], return_tensors='pt', padding='max_length', truncation=True, max_length=self.max_length_encoder)
else:
model_inputs = self.tokenizer(input_texts[s:e], return_tensors='pt', padding=True, truncation=True)
if self.max_length_decoder is not None:
output_encode = self.tokenizer(text_target=output_texts[s:e], return_tensors='pt', padding='max_length', truncation=True, max_length=self.max_length_decoder)
else:
output_encode = self.tokenizer(text_target=output_texts[s:e], return_tensors='pt', padding=True, truncation=True)
# shift the label sequence for causal inference
label = output_encode["input_ids"]
label[label == self.tokenizer.pad_token_id] = PAD_TOKEN_LABEL_ID
model_inputs["labels"] = label.to(self.device)
# model run & loss conversion into likelihood
valid_length = (model_inputs["labels"] != PAD_TOKEN_LABEL_ID).sum(dim=-1)
output = self.model(**{k: v.to(self.device) for k, v in model_inputs.items()})
loss = self.loss_fct(output['logits'].view(-1, self.config.vocab_size), model_inputs["labels"].view(-1))
loss = loss.view(len(output['logits']), -1)
loss = torch.sum(loss, -1) / valid_length
loss_list += loss.cpu().tolist()
# conversion to perplexity
ppl = [exp(i) for i in loss_list]
return ppl[0] if single_input else ppl
scorer = EncoderDecoderLM("google/flan-t5-xxl") #downloads the model to a newly created "/home/racball/flan"
inputs = [
'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.',
'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.'
]
outputs = [
'I am happy.',
'I am sad.'
]
ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs)
print(list(zip(outputs, ppl)))
Errror is the same (doesn't recog. GPU and some tensors on cuda for some magical reason)
RuntimeError Traceback (most recent call last)
Cell In[4], line 9
1 inputs = [
2 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.',
3 'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.'
4 ]
5 outputs = [
6 'I am happy.',
7 'I am sad.'
8 ]
----> 9 ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs)
10 print(list(zip(outputs, ppl)))
Cell In[2], line 156, in EncoderDecoderLM.get_perplexity(self, input_texts, output_texts, batch)
154 valid_length = (model_inputs["labels"] != PAD_TOKEN_LABEL_ID).sum(dim=-1)
155 output = self.model(**{k: v.to(self.device) for k, v in model_inputs.items()})
--> 156 loss = self.loss_fct(output['logits'].view(-1, self.config.vocab_size), model_inputs["labels"].view(-1))
157 loss = loss.view(len(output['logits']), -1)
158 loss = torch.sum(loss, -1) [/](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/) valid_length
File [/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/modules/module.py:1194](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/modules/module.py:1194), in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
File [/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/modules/loss.py:1174](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/modules/loss.py:1174), in CrossEntropyLoss.forward(self, input, target)
1173 def forward(self, input: Tensor, target: Tensor) -> Tensor:
-> 1174 return F.cross_entropy(input, target, weight=self.weight,
1175 ignore_index=self.ignore_index, reduction=self.reduction,
1176 label_smoothing=self.label_smoothing)
File [/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/functional.py:3026](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/functional.py:3026), in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
3024 if size_average is not None or reduce is not None:
3025 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3026 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument target in method wrapper_nll_loss_forward)
If the enviroment.yml helps
I have 8GPUs A100 GPUs
NVIDIA-SMI 530.30.02, Driver Version: 530.30.02, CUDA Version: 12.1
Maybe there is mismatch between the cuda version of my GPU versus what lmppl installs? Do you think that might be causing an issue?
name: lmppl
channels:
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- asttokens=2.2.1=pyhd8ed1ab_0
- backcall=0.2.0=pyh9f0ad1d_0
- backports=1.0=pyhd8ed1ab_3
- backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
- bzip2=1.0.8=h7b6447c_0
- ca-certificates=2022.12.7=ha878542_0
- certifi=2022.12.7=pyhd8ed1ab_0
- debugpy=1.5.1=py310h295c915_0
- decorator=5.1.1=pyhd8ed1ab_0
- entrypoints=0.4=pyhd8ed1ab_0
- executing=1.2.0=pyhd8ed1ab_0
- ipykernel=6.15.0=pyh210e3f2_0
- ipython=8.11.0=pyh41d4057_0
- jedi=0.18.2=pyhd8ed1ab_0
- jupyter_client=7.3.4=pyhd8ed1ab_0
- jupyter_core=5.2.0=py310hff52083_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.2=h6a678d5_6
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libsodium=1.0.18=h36c2ea0_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- matplotlib-inline=0.1.6=pyhd8ed1ab_0
- ncurses=6.4=h6a678d5_0
- nest-asyncio=1.5.6=pyhd8ed1ab_0
- openssl=1.1.1t=h7f8727e_0
- packaging=23.0=pyhd8ed1ab_0
- parso=0.8.3=pyhd8ed1ab_0
- pexpect=4.8.0=pyh1a96a4e_2
- pickleshare=0.7.5=py_1003
- pip=23.0.1=py310h06a4308_0
- platformdirs=3.1.0=pyhd8ed1ab_0
- prompt-toolkit=3.0.38=pyha770c72_0
- prompt_toolkit=3.0.38=hd8ed1ab_0
- ptyprocess=0.7.0=pyhd3deb0d_0
- pure_eval=0.2.2=pyhd8ed1ab_0
- pygments=2.14.0=pyhd8ed1ab_0
- python=3.10.9=h7a1cb2a_2
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python_abi=3.10=2_cp310
- pyzmq=23.2.0=py310h6a678d5_0
- readline=8.2=h5eee18b_0
- setuptools=65.6.3=py310h06a4308_0
- six=1.16.0=pyh6c4a22f_0
- sqlite=3.40.1=h5082296_0
- stack_data=0.6.2=pyhd8ed1ab_0
- tk=8.6.12=h1ccaba5_0
- tornado=6.1=py310h5764c6d_3
- traitlets=5.9.0=pyhd8ed1ab_0
- typing_extensions=4.4.0=pyha770c72_0
- tzdata=2022g=h04d1e81_0
- wcwidth=0.2.6=pyhd8ed1ab_0
- wheel=0.38.4=py310h06a4308_0
- xz=5.2.10=h5eee18b_1
- zeromq=4.3.4=h9c3ff4c_1
- zlib=1.2.13=h5eee18b_0
- pip:
- accelerate==0.17.0
- charset-normalizer==3.1.0
- filelock==3.9.0
- huggingface-hub==0.13.1
- idna==3.4
- lmppl==0.1.9
- numpy==1.24.2
- nvidia-cublas-cu11==11.10.3.66
- nvidia-cuda-nvrtc-cu11==11.7.99
- nvidia-cuda-runtime-cu11==11.7.99
- nvidia-cudnn-cu11==8.5.0.96
- protobuf==3.19.6
- psutil==5.9.4
- pyyaml==6.0
- regex==2022.10.31
- requests==2.28.2
- sentencepiece==0.1.97
- tokenizers==0.13.2
- torch==1.13.1
- tqdm==4.65.0
- transformers==4.26.1
- typing-extensions==4.5.0
- urllib3==1.26.14
prefix: /nobackup/racball/miniconda3/envs/lmppl
Hi, I have added an option to specify the cache dir hf_cache_dir
. Can you try again with lmppl==0.2.0?
This works! Thanks a ton for adding that feature. Perhaps the issue was indeed in the way I was trying to hack into the feature in the previous versions :)
I still haven't tested if bringing back device_map will break it but well, for now, this is gold!
Do feel free to close this issue :D
PS - I'm sure its on your to-dos....but in the off chance its slipped by you...some documentation needs to be changed given the new feature :) Thanks again!
Hi,
Thanks for this great resource.
Trying to run this snippet of code
runs into this stack of errors
Tried forcing
.to('cuda:0')
in multiple parts of the source code to no avail. Any thoughts?