Closed shivamtawari closed 2 months ago
If I don't pass device = device
in the pipeline it shows me that the model will be shifted to CPU.
generator = pipeline(
model=model, tokenizer=tokenizer,
task='text-generation',
max_new_tokens=50,
repetition_penalty=1.1,
device=device
)
I have already tried following the Zephyr documentation, but the code results in:
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU
Code:
from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer, pipeline
#import torch
#device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
model = AutoModelForCausalLM.from_pretrained(
"TheBloke/zephyr-7B-alpha-GGUF",
model_file="zephyr-7b-alpha.Q4_K_M.gguf",
model_type="mistral",
gpu_layers=50,
hf=True
)
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")
prompt = """<|system|>You are a helpful, respectful and honest assistant for labeling topics..</s>
<|user|>
I have a topic that contains the following documents:
[DOCUMENTS]
The topic is described by the following keywords: '[KEYWORDS]'.
Based on the information about the topic above, please create a short label of this topic. Make sure you to only return the label and nothing more.</s>
<|assistant|>"""
# Pipeline
generator = pipeline(
model=model, tokenizer=tokenizer,
task='text-generation',
max_new_tokens=50,
repetition_penalty=1.1,
#device=device
)
Output:
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Fetchingβ1βfiles:β100%
β1/1β[00:00<00:00,ββ4.51it/s]
config.json:β100%
β31.0/31.0β[00:00<00:00,β1.94kB/s]
Fetchingβ1βfiles:β100%
β1/1β[00:32<00:00,β32.80s/it]
zephyr-7b-alpha.Q4_K_M.gguf:β100%
β4.37G/4.37Gβ[00:32<00:00,β143MB/s]
tokenizer_config.json:β100%
β1.43k/1.43kβ[00:00<00:00,β75.0kB/s]
tokenizer.model:β100%
β493k/493kβ[00:00<00:00,β1.54MB/s]
tokenizer.json:β100%
β1.80M/1.80Mβ[00:00<00:00,β6.66MB/s]
added_tokens.json:β100%
β42.0/42.0β[00:00<00:00,β2.55kB/s]
special_tokens_map.json:β100%
β168/168β[00:00<00:00,β8.60kB/s]
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Hmmm, I'm not entirely sure what is happening. It might be worthwhile to check the official Transformers documentation to see how you could enable this properly. You can test it outside of BERTopic since BERTopic simply calls the pipeline
and nothing more.
Note that I would advise using llama.cpp python instead. It should make all of this much easier.
Thanks @MaartenGr! I was able to use llama.cpp. I will also check the official Transformers documentation and update here if I find anything new.
Have you searched existing issues? π
Desribe the bug
I am facing this issue when trying to use Zephyr for representation model.
Reproduction
BERTopic Version
v0.16.3