asahi417 / lmppl

Calculate perplexity on a text with pre-trained language models. Support MLM (eg. DeBERTa), recurrent LM (eg. GPT3), and encoder-decoder LM (eg. Flan-T5).
MIT License
134 stars 11 forks source link

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! #1

Closed rachithaiyappa closed 1 year ago

rachithaiyappa commented 1 year ago

Hi,

Thanks for this great resource.

Trying to run this snippet of code

import lmppl

scorer = lmppl.EncoderDecoderLM("/home/racball/models--flan-t5-xxl",device_map='auto',low_cpu_mem_usage=True)

inputs = [
    'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.',
    'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.'
]
outputs = [
    'I am happy.',
    'I am sad.'
]
ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs)
print(list(zip(outputs, ppl)))

runs into this stack of errors

RuntimeError                              Traceback (most recent call last)
Cell In[6], line 14
      6 inputs = [
      7     'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.',
      8     'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.'
      9 ]
     10 outputs = [
     11     'I am happy.',
     12     'I am sad.'
     13 ]
---> 14 ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs)
     15 print(list(zip(outputs, ppl)))
     16 # >>> [
     17 #   ('I am happy.', 4138.748977714201),
     18 #   ('I am sad.', 2991.629250051472)
     19 # ]
     20 # print(f"prediction: {outputs[ppl.index(min(ppl))]}")
     21 # >>> "prediction: I am sad."

File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/lmppl/ppl_encoder_decoder_lm.py:157](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/lmppl/ppl_encoder_decoder_lm.py:157), in EncoderDecoderLM.get_perplexity(self, input_texts, output_texts, batch)
    155 # model run & loss conversion into likelihood
    156 valid_length = (model_inputs["labels"] != PAD_TOKEN_LABEL_ID).sum(dim=-1)
--> 157 output = self.model(**{k: v.to(self.device) for k, v in model_inputs.items()})
    158 loss = self.loss_fct(output['logits'].view(-1, self.config.vocab_size), model_inputs["labels"].view(-1))
    159 loss = loss.view(len(output['logits']), -1)

File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/modules/module.py:1194](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/modules/module.py:1194), in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/accelerate/hooks.py:158](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/accelerate/hooks.py:158), in add_hook_to_module..new_forward(*args, **kwargs)
    156         output = old_forward(*args, **kwargs)
    157 else:
--> 158     output = old_forward(*args, **kwargs)
    159 return module._hf_hook.post_forward(module, output)

File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:1696](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:1696), in T5ForConditionalGeneration.forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1694 if labels is not None:
   1695     loss_fct = CrossEntropyLoss(ignore_index=-100)
...
   3024 if size_average is not None or reduce is not None:
   3025     reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3026 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument target in method wrapper_nll_loss_forward)

Tried forcing .to('cuda:0') in multiple parts of the source code to no avail. Any thoughts?

asahi417 commented 1 year ago

Can you check if it works without device_map option?

scorer = lmppl.EncoderDecoderLM("/home/racball/models--flan-t5-xxl")
asahi417 commented 1 year ago

I'm actually aware of the issue that when you specify device_map='auto' on a node with multiple gpus available, the perplexity calculation raises error due to the model allocation across different gpus. I'm trying to fix, but not finished yet, so I would suggest to turn off the device mapping for now.

rachithaiyappa commented 1 year ago

Thanks for getting back. I actually did try it without the device map. The code doesn’t recognise the presence of GPUs, loads the model on the CPU and throws the similar error that it found Atleast two devices “cpu” and “cuda”

I will share the full error stack here soon

if it helps, I also think the ‘forward’ function is messing up stuff here, independent on your code. I mostly use model.generate for my programs and it doesn’t give an issues. But as soon as I shift to model.forward, this sort of issue arises.

asahi417 commented 1 year ago

That's interesting. Could it be something to do with the transformers version? I can calculate perplexity with google/flan-t5-xxl via lmppl.EncoderDecoderLM on my two-gpus node. The version of transformers related library follows below.

transformers 4.26.1
huggingface-hub 0.12.0
sentencepiece 0.1.97

rachithaiyappa commented 1 year ago

I'm using transformers 4.26.1 huggingface-hub 0.13.1 sentencepiece 0.1.97

But I don't think the huggingface-hub should matter? I can downgrade it if you think it matters

rachithaiyappa commented 1 year ago

for completeness in this discussion, here's the full error stack for running without the device map for

scorer = lmppl.EncoderDecoderLM("/home/racball/models--flan-t5-xxl")


RuntimeError                              Traceback (most recent call last)
Cell In[1], line 16
      8 inputs = [
      9     'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.',
     10     'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.'
     11 ]
     12 outputs = [
     13     'I am happy.',
     14     'I am sad.'
     15 ]
---> 16 ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs)
     17 print(list(zip(outputs, ppl)))
     18 # >>> [
     19 #   ('I am happy.', 4138.748977714201),
     20 #   ('I am sad.', 2991.629250051472)
     21 # ]
     22 # print(f"prediction: {outputs[ppl.index(min(ppl))]}")
     23 # >>> "prediction: I am sad."

File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/lmppl/ppl_encoder_decoder_lm.py:158](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/lmppl/ppl_encoder_decoder_lm.py:158), in EncoderDecoderLM.get_perplexity(self, input_texts, output_texts, batch)
    156 valid_length = (model_inputs["labels"] != PAD_TOKEN_LABEL_ID).sum(dim=-1)
    157 output = self.model(**{k: v.to(self.device) for k, v in model_inputs.items()})
--> 158 loss = self.loss_fct(output['logits'].view(-1, self.config.vocab_size), model_inputs["labels"].view(-1))
    159 loss = loss.view(len(output['logits']), -1)
    160 loss = torch.sum(loss, -1) [/](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/) valid_length

File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/modules/module.py:1194](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/modules/module.py:1194), in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/modules/loss.py:1174](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/modules/loss.py:1174), in CrossEntropyLoss.forward(self, input, target)
   1173 def forward(self, input: Tensor, target: Tensor) -> Tensor:
-> 1174     return F.cross_entropy(input, target, weight=self.weight,
   1175                            ignore_index=self.ignore_index, reduction=self.reduction,
   1176                            label_smoothing=self.label_smoothing)

File [/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/functional.py:3026](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/bertviz/lib/python3.10/site-packages/torch/nn/functional.py:3026), in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   3024 if size_average is not None or reduce is not None:
   3025     reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3026 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument target in method wrapper_nll_loss_forward)```
asahi417 commented 1 year ago

I'm using transformers 4.26.1 huggingface-hub 0.13.1 sentencepiece 0.1.97

But I don't think the huggingface-hub should matter? I can downgrade it if you think it matters

Yeah, I don't think it matters. I put it just in case, but I guess you don't have to downgrade it.

asahi417 commented 1 year ago

Btw, which version of lmppl are you using?

asahi417 commented 1 year ago

The latest version is 0.1.9, so you can try to upgrade to the latest version if it's old one. Otherwise, can you share the whole script you use, and I will try it on my env to see if I can reproduce the error.

rachithaiyappa commented 1 year ago

I was using the latest one (0.1.9)

Since your last message, I uninstalled lmppl, reinstalled it in a fresh environment and even redownloaded the flan-t5-xxl model and reran the code without the device_map argument. Ran into the same error.

Btw for enhancement, my request is to allow a cache/storing directory where the model is going to be stored on the local machine, as an input to the class since these models are usually huge and the default directory to which hugging face downloads may have limited space :)

Below is

  1. the source code with a modification to where the model is stored (I needed to do this since the default direct didnt have enough space)
  2. My main script
  3. The error stack.
""" Caluculate decoder perpleity of encoder-decoder LM.
>>> from lmppl import EncoderDecoderLM
>>> scorer = EncoderDecoderLM('t5-small')
>>> scores = scorer.get_perplexity(
        input_texts=['sentiment classification: I have a bad day'] * 2,
        output_texts=['happy', 'sad'])
>>> print(scores)
[373.821367795063, 274.29454188096724]
"""
import os
import logging
from math import exp
from typing import List

from tqdm import tqdm
import torch
import transformers

# from .util import internet_connection

os.environ["OMP_NUM_THREADS"] = "1"  # to turn off warning message
os.environ["TOKENIZERS_PARALLELISM"] = "false"  # to turn off warning message
PAD_TOKEN_LABEL_ID = torch.nn.CrossEntropyLoss().ignore_index

def get_lm(model_name: str,
           use_auth_token: bool = False,
           torch_dtype=None,
           device_map: str = None,
           low_cpu_mem_usage: bool = False):
    """ get encoder-decoder lms from huggingface """
    # tokenizer
    local_files_only = not internet_connection()
    tokenizer = transformers.AutoTokenizer.from_pretrained(
        model_name, local_files_only=local_files_only, use_auth_token=use_auth_token,cache_dir = "/home/racball/flan", force_download = True)

    # config
    config = transformers.AutoConfig.from_pretrained(
        model_name, local_files_only=local_files_only, use_auth_token=use_auth_token,cache_dir = "/home/racball/flan", force_download = True)

    # model
    if config.model_type == 't5':  # T5 model requires T5ForConditionalGeneration class
        model_class = transformers.T5ForConditionalGeneration.from_pretrained
    elif config.model_type == 'mt5':
        model_class = transformers.MT5ForConditionalGeneration.from_pretrained
    elif config.model_type == 'bart':
        model_class = transformers.BartForConditionalGeneration.from_pretrained
    elif config.model_type == 'mbart':
        model_class = transformers.MBartForConditionalGeneration.from_pretrained
    elif config.model_type == 'switch_transformers':
        model_class = transformers.SwitchTransformersForConditionalGeneration.from_pretrained
    else:
        raise ValueError(f'unsupported model type: {config.model_type}')
    param = {'config': config, "local_files_only": local_files_only, "use_auth_token": use_auth_token, "low_cpu_mem_usage": low_cpu_mem_usage}
    if torch_dtype is not None:
        param['torch_dtype'] = torch_dtype
    if device_map is not None:
        param['device_map'] = device_map
    model = model_class(model_name, cache_dir = "/home/racball/flan", force_download = True, **param)
    if model.config.decoder_start_token_id is None:
        model.config.decoder_start_token_id = tokenizer.pad_token_id
    return tokenizer, model, config

class EncoderDecoderLM:
    """ Encoder-Decoder Language Model """

    def __init__(self,
                 model: str = 't5-small',
                 use_auth_token: bool = False,
                 max_length_encoder: int = None,
                 max_length_decoder: int = None,
                 num_gpus: int = None,
                 torch_dtype=None,
                 device_map: str = None,
                 low_cpu_mem_usage: bool = False):
        """ Encoder-Decoder Language Model.
        @param model: Model alias or path to local model file.
        @param use_auth_token: Huggingface transformers argument of `use_auth_token`
        @param device: Device name to load the models.
        @param num_gpus: Number of gpus to be used.
        """
        logging.info(f'Loading Model: `{model}`')

        # load model
        self.tokenizer, self.model, self.config = get_lm(
            model, use_auth_token=use_auth_token, torch_dtype=torch_dtype, device_map=device_map, low_cpu_mem_usage=low_cpu_mem_usage)
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = "<<PAD>>"
        if max_length_encoder is None:
            self.max_length_encoder = None
        else:
            self.max_length_encoder = max_length_encoder if max_length_encoder is not None else self.tokenizer.model_max_length
            assert self.max_length_encoder <= self.tokenizer.model_max_length, f"{self.max_length_encoder} > {self.tokenizer.model_max_length}"
        if max_length_decoder is None:
            self.max_length_decoder = None
        else:
            self.max_length_decoder = max_length_decoder if max_length_decoder is not None else self.tokenizer.model_max_length
            assert self.max_length_decoder <= self.tokenizer.model_max_length, f"{self.max_length_decoder} > {self.tokenizer.model_max_length}"

        # loss function
        self.loss_fct = torch.nn.CrossEntropyLoss(reduction='none')

        # GPU setup
        self.device = self.model.device
        if device_map is None:
            num_gpus = torch.cuda.device_count() if num_gpus is None else num_gpus
            if num_gpus > 0:
                self.model = torch.nn.DataParallel(self.model)
                self.model.to('cuda')
        self.model.eval()
        logging.info(f'\t * model is loaded on: {self.device}')

    def get_perplexity(self, input_texts: str or List, output_texts: str or List, batch: int = None):
        """ Compute the perplexity on decoder of the seq2seq model.
        :param input_texts: A string or list of input texts for the encoder.
        :param output_texts: A string or list of output texts for the decoder.
        :param batch: Batch size
        :return: A value or list of perplexity.
        """
        assert type(input_texts) is type(output_texts), f"{type(input_texts)} != {type(output_texts)}"

        # batch preparation
        single_input = type(input_texts) == str
        input_texts = [input_texts] if single_input else input_texts
        output_texts = [output_texts] if single_input else output_texts
        assert len(input_texts) == len(output_texts), f"{len(input_texts)} == {len(output_texts)}"
        batch = len(output_texts) if batch is None else batch
        batch_id = list(range(0, len(input_texts), batch)) + [len(output_texts)]
        batch_id = list(zip(batch_id[:-1], batch_id[1:]))

        loss_list = []
        with torch.no_grad():
            for s, e in tqdm(batch_id):

                # input feature
                if self.max_length_encoder is not None:
                    model_inputs = self.tokenizer(
                        input_texts[s:e], return_tensors='pt', padding='max_length', truncation=True, max_length=self.max_length_encoder)
                else:
                    model_inputs = self.tokenizer(input_texts[s:e], return_tensors='pt', padding=True, truncation=True)

                if self.max_length_decoder is not None:
                    output_encode = self.tokenizer(text_target=output_texts[s:e], return_tensors='pt', padding='max_length', truncation=True, max_length=self.max_length_decoder)
                else:
                    output_encode = self.tokenizer(text_target=output_texts[s:e], return_tensors='pt', padding=True, truncation=True)

                # shift the label sequence for causal inference
                label = output_encode["input_ids"]
                label[label == self.tokenizer.pad_token_id] = PAD_TOKEN_LABEL_ID
                model_inputs["labels"] = label.to(self.device)

                # model run & loss conversion into likelihood
                valid_length = (model_inputs["labels"] != PAD_TOKEN_LABEL_ID).sum(dim=-1)
                output = self.model(**{k: v.to(self.device) for k, v in model_inputs.items()})
                loss = self.loss_fct(output['logits'].view(-1, self.config.vocab_size), model_inputs["labels"].view(-1))
                loss = loss.view(len(output['logits']), -1)
                loss = torch.sum(loss, -1) / valid_length
                loss_list += loss.cpu().tolist()

        # conversion to perplexity
        ppl = [exp(i) for i in loss_list]
        return ppl[0] if single_input else ppl
  1. scorer = EncoderDecoderLM("google/flan-t5-xxl") #downloads the model to a newly created "/home/racball/flan"
    inputs = [
        'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.',
        'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.'
    ]
    outputs = [
        'I am happy.',
        'I am sad.'
    ]
    ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs)
    print(list(zip(outputs, ppl)))
  2. Errror is the same (doesn't recog. GPU and some tensors on cuda for some magical reason)

RuntimeError                              Traceback (most recent call last)
Cell In[4], line 9
      1 inputs = [
      2     'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.',
      3     'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee.'
      4 ]
      5 outputs = [
      6     'I am happy.',
      7     'I am sad.'
      8 ]
----> 9 ppl = scorer.get_perplexity(input_texts=inputs, output_texts=outputs)
     10 print(list(zip(outputs, ppl)))

Cell In[2], line 156, in EncoderDecoderLM.get_perplexity(self, input_texts, output_texts, batch)
    154 valid_length = (model_inputs["labels"] != PAD_TOKEN_LABEL_ID).sum(dim=-1)
    155 output = self.model(**{k: v.to(self.device) for k, v in model_inputs.items()})
--> 156 loss = self.loss_fct(output['logits'].view(-1, self.config.vocab_size), model_inputs["labels"].view(-1))
    157 loss = loss.view(len(output['logits']), -1)
    158 loss = torch.sum(loss, -1) [/](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/) valid_length

File [/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/modules/module.py:1194](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/modules/module.py:1194), in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File [/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/modules/loss.py:1174](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/modules/loss.py:1174), in CrossEntropyLoss.forward(self, input, target)
   1173 def forward(self, input: Tensor, target: Tensor) -> Tensor:
-> 1174     return F.cross_entropy(input, target, weight=self.weight,
   1175                            ignore_index=self.ignore_index, reduction=self.reduction,
   1176                            label_smoothing=self.label_smoothing)

File [/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/functional.py:3026](https://vscode-remote+ssh-002dremote-002bash.vscode-resource.vscode-cdn.net/nobackup/racball/miniconda3/envs/lmppl/lib/python3.10/site-packages/torch/nn/functional.py:3026), in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   3024 if size_average is not None or reduce is not None:
   3025     reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3026 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument target in method wrapper_nll_loss_forward)
rachithaiyappa commented 1 year ago

If the enviroment.yml helps I have 8GPUs A100 GPUs NVIDIA-SMI 530.30.02, Driver Version: 530.30.02, CUDA Version: 12.1

Maybe there is mismatch between the cuda version of my GPU versus what lmppl installs? Do you think that might be causing an issue?

name: lmppl
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - asttokens=2.2.1=pyhd8ed1ab_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=pyhd8ed1ab_3
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2022.12.7=ha878542_0
  - certifi=2022.12.7=pyhd8ed1ab_0
  - debugpy=1.5.1=py310h295c915_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - executing=1.2.0=pyhd8ed1ab_0
  - ipykernel=6.15.0=pyh210e3f2_0
  - ipython=8.11.0=pyh41d4057_0
  - jedi=0.18.2=pyhd8ed1ab_0
  - jupyter_client=7.3.4=pyhd8ed1ab_0
  - jupyter_core=5.2.0=py310hff52083_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.4.2=h6a678d5_6
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libsodium=1.0.18=h36c2ea0_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - libuuid=1.41.5=h5eee18b_0
  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
  - ncurses=6.4=h6a678d5_0
  - nest-asyncio=1.5.6=pyhd8ed1ab_0
  - openssl=1.1.1t=h7f8727e_0
  - packaging=23.0=pyhd8ed1ab_0
  - parso=0.8.3=pyhd8ed1ab_0
  - pexpect=4.8.0=pyh1a96a4e_2
  - pickleshare=0.7.5=py_1003
  - pip=23.0.1=py310h06a4308_0
  - platformdirs=3.1.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.38=pyha770c72_0
  - prompt_toolkit=3.0.38=hd8ed1ab_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - pygments=2.14.0=pyhd8ed1ab_0
  - python=3.10.9=h7a1cb2a_2
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python_abi=3.10=2_cp310
  - pyzmq=23.2.0=py310h6a678d5_0
  - readline=8.2=h5eee18b_0
  - setuptools=65.6.3=py310h06a4308_0
  - six=1.16.0=pyh6c4a22f_0
  - sqlite=3.40.1=h5082296_0
  - stack_data=0.6.2=pyhd8ed1ab_0
  - tk=8.6.12=h1ccaba5_0
  - tornado=6.1=py310h5764c6d_3
  - traitlets=5.9.0=pyhd8ed1ab_0
  - typing_extensions=4.4.0=pyha770c72_0
  - tzdata=2022g=h04d1e81_0
  - wcwidth=0.2.6=pyhd8ed1ab_0
  - wheel=0.38.4=py310h06a4308_0
  - xz=5.2.10=h5eee18b_1
  - zeromq=4.3.4=h9c3ff4c_1
  - zlib=1.2.13=h5eee18b_0
  - pip:
    - accelerate==0.17.0
    - charset-normalizer==3.1.0
    - filelock==3.9.0
    - huggingface-hub==0.13.1
    - idna==3.4
    - lmppl==0.1.9
    - numpy==1.24.2
    - nvidia-cublas-cu11==11.10.3.66
    - nvidia-cuda-nvrtc-cu11==11.7.99
    - nvidia-cuda-runtime-cu11==11.7.99
    - nvidia-cudnn-cu11==8.5.0.96
    - protobuf==3.19.6
    - psutil==5.9.4
    - pyyaml==6.0
    - regex==2022.10.31
    - requests==2.28.2
    - sentencepiece==0.1.97
    - tokenizers==0.13.2
    - torch==1.13.1
    - tqdm==4.65.0
    - transformers==4.26.1
    - typing-extensions==4.5.0
    - urllib3==1.26.14
prefix: /nobackup/racball/miniconda3/envs/lmppl
asahi417 commented 1 year ago

Hi, I have added an option to specify the cache dir hf_cache_dir. Can you try again with lmppl==0.2.0?

rachithaiyappa commented 1 year ago

This works! Thanks a ton for adding that feature. Perhaps the issue was indeed in the way I was trying to hack into the feature in the previous versions :)

I still haven't tested if bringing back device_map will break it but well, for now, this is gold!

Do feel free to close this issue :D

PS - I'm sure its on your to-dos....but in the off chance its slipped by you...some documentation needs to be changed given the new feature :) Thanks again!