Closed FurkanGozukara closed 1 year ago
Anything in Hugging Face works. Here's one of several good writeups. https://huggingface.co/blog/how-to-generate This is not specific to this model.
Anything in Hugging Face works. Here's one of several good writeups. https://huggingface.co/blog/how-to-generate This is not specific to this model.
i dont think work
for example i tried load_in_8bit=True
and it didnt work
or all this failed too
torch.uint8
I thought you were asking about generation options. Your changes aren't really making it work, just load on the CPU. Please see previous comments and read up on loading in 8 bit
But, it's easier to just load a smaller model. Try the 6.9B or 2.7B data set.
A lot of this isn't really about this model, but just about HF and LLMs, so there are many resources out there for you
I thought you were asking about generation options. Your changes aren't really making it work, just load on the CPU. Please see previous comments and read up on loading in 8 bit
But, it's easier to just load a smaller model. Try the 6.9B or 2.7B data set.
A lot of this isn't really about this model, but just about HF and LLMs, so there are many resources out there for you
ty for answers. i am trying to prepare a guide for non-techy people
so this is the least vram and fastest we can get with 12b model?
can we display status of generation somehow? just waiting is bad and don't see nothing either in cmd too
instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.float16, trust_remote_code=True, device_map="auto",task="text-generation")
what is the max_new_tokens size?
any other token size parameter?
any way to speed up? i am using hf code to run atm
Well, RAM and speed are in tension. Fastest is going to be an A100. Least memory is something with 16GB, loading in 8bit. I don't believe the HF pipeline object shows a progress bar, no. It should be 10s of seconds on reasonable hardware. max_new_tokens is the max number of words, more or less, that it will try to generate. Turn it down to 128 or 64, that will be a little faster. I think it's already clear, you want bigger hardware to run 'fast'.
Well, RAM and speed are in tension. Fastest is going to be an A100. Least memory is something with 16GB, loading in 8bit. I don't believe the HF pipeline object shows a progress bar, no. It should be 10s of seconds on reasonable hardware. max_new_tokens is the max number of words, more or less, that it will try to generate. Turn it down to 128 or 64, that will be a little faster. I think it's already clear, you want bigger hardware to run 'fast'.
How to load in 8bit? The code you linked didn't work thrown error
It does work; I'm using it now. load_in_8bit=True. That's another problem, and I posted responses to it, and there are other threads here on related issues. It's hard to debug your usage.
It does work; I'm using it now. load_in_8bit=True. That's another problem, and I posted responses to it, and there are other threads here on related issues. It's hard to debug your usage.
instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-7b", torch_dtype=torch.bfloat16, load_in_8bit=True, trust_remote_code=True, device_map="auto",task="text-generation",max_new_tokens=128)
sorry i didnt give more info
it loads but when i try to get text it throws this error
(dolly) F:\Dolly 2.0\dolly\Scripts>python "F:\Dolly 2.0\demo.py"
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\routes.py", line 395, in run_predict
output = await app.get_blocks().process_api(
File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 1193, in process_api
result = await self.call_function(
File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 916, in call_function
prediction = await anyio.to_thread.run_sync(
File "F:\Dolly 2.0\dolly\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "F:\Dolly 2.0\dolly\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "F:\Dolly 2.0\dolly\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
result = context.run(func, *args)
File "F:\Dolly 2.0\demo.py", line 21, in generate
return instruct_pipeline(instruction)
File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\pipelines\base.py", line 1074, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\pipelines\base.py", line 1081, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\pipelines\base.py", line 990, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "C:\Users\King/.cache\huggingface\modules\transformers_modules\local\instruct_pipeline.py", line 103, in _forward
generated_sequence = self.model.generate(
File "F:\Dolly 2.0\dolly\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py", line 1296, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py", line 993, in _validate_model_kwargs
raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['load_in_8bit'] (note: typos in the generate arguments will also show up in this list)
Ah, try model_kwargs={"load_in_8bit": True}
. There is a different reason this pattern isn't quite working that I think Matt is fixing.
You can also try something a little more unrolled like this:
model_name = "databricks/dolly-v2-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True, trust_remote_code=True)
end_key_token_id = tokenizer.encode("### End")[0]
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, \
pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id)
Ah, try
model_kwargs={"load_in_8bit": True}
. There is a different reason this pattern isn't quite working that I think Matt is fixing.You can also try something a little more unrolled like this:
model_name = "databricks/dolly-v2-12b" tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left") model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True, trust_remote_code=True) end_key_token_id = tokenizer.encode("### End")[0] pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, \ pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id)
thank you so much
even though i have install torch latest version with cuda into my venv i got error after using your code
i am on windows
(dolly) F:\Dolly 2.0\dolly\Scripts>python "F:\Dolly 2.0\dolly.py"
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1093, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "C:\Python3108\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\bitsandbytes.py", line 10, in <module>
import bitsandbytes as bnb
File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
from . import cuda_setup, utils, research
File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
from . import nn
File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
from .modules import LinearFP8Mixed, LinearFP8Global
File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
from bitsandbytes.optim import GlobalOptimManager
File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cextension.py", line 20, in <module>
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "F:\Dolly 2.0\dolly.py", line 2, in <module>
from transformers import *
File "<frozen importlib._bootstrap>", line 1073, in _handle_fromlist
File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1081, in __getattr__
value = self._get_module(name)
File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1095, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.utils.bitsandbytes because of the following error (look up to see its traceback):
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
(dolly) F:\Dolly 2.0\dolly\Scripts>
You don't have all the needed CUDA libs installed or bitsandbytes doesn't support windows. Not sure which or both maybe.
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
i got it installed and even in the path weird
@srowen i made it work
i got just 1 final error that i need your help
To create a public link, set `share=True` in `launch()`.
F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py:1387: UserWarning: Neither `max_length` nor `max_new_tokens` has been set, `max_length` will default to 20 (`self.config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
warnings.warn(
Input length of input_ids is 25, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py:1470: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
warnings.warn(
Traceback (most recent call last):
File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\routes.py", line 395, in run_predict
output = await app.get_blocks().process_api(
File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 1196, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 1130, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\components.py", line 5589, in postprocess
unindented_y = inspect.cleandoc(y)
File "C:\Python3108\lib\inspect.py", line 750, in cleandoc
lines = doc.expandtabs().split('\n')
AttributeError: 'list' object has no attribute 'expandtabs'
and this is the gradio code
import gradio as gr
from transformers import *
import torch
theme = gr.themes.Monochrome(
primary_hue="indigo",
secondary_hue="blue",
neutral_hue="slate",
radius_size=gr.themes.sizes.radius_sm,
font=[gr.themes.GoogleFont("Open Sans"), "ui-sans-serif", "system-ui", "sans-serif"],
)
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.float16, trust_remote_code=True, device_map="auto",task="text-generation")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation", max_new_tokens=128)
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-7b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-7b", torch_dtype=torch.bfloat16, load_in_8bit=True, trust_remote_code=True, device_map="auto",task="text-generation",max_new_tokens=2048)
model_name = "F:/Dolly 2.0/dolly-v2-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left",max_new_tokens=256)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True, trust_remote_code=True)
end_key_token_id = tokenizer.encode("### End")[0]
instruct_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer,pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id)
def generate(instruction):
return instruct_pipeline(instruction)
examples = [
"Instead of making a peanut butter and jelly sandwich, what else could I combine peanut butter with in a sandwich? Give five ideas",
"How do I make a campfire?",
"Write me a tweet about the release of Dolly 2.0, a new LLM"
]
def process_example(args):
for x in generate(args):
pass
return x
css = ".generating {visibility: hidden}"
with gr.Blocks(theme=theme, analytics_enabled=False, css=css) as demo:
with gr.Column():
gr.Markdown(
""" ## Dolly 2.0
Dolly 2.0 is a 12B parameter language model based on the EleutherAI pythia model family and fine-tuned exclusively on a new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees. For more details, please refer to the [model card](https://huggingface.co/databricks/dolly-v2-12b)
Type in the box below and click the button to generate answers to your most pressing questions!
"""
)
gr.HTML("<p>Check out SECourses for AI, Stable Diffusion, ML and Programming Related Full Free Courses, Tutorials and Guides : <a target='_blank' style='display:inline-block' href='https://www.youtube.com/@SECourses' alt='https://www.youtube.com/@SECourses'>https://www.youtube.com/@SECourses</a> </p>")
with gr.Row():
with gr.Column(scale=3):
instruction = gr.Textbox(placeholder="Enter your question here", label="Question", elem_id="q-input")
with gr.Box():
gr.Markdown("**Answer**")
output = gr.Markdown(elem_id="q-output")
submit = gr.Button("Generate", variant="primary")
gr.Examples(
examples=examples,
inputs=[instruction],
cache_examples=False,
fn=process_example,
outputs=[output],
)
submit.click(generate, inputs=[instruction], outputs=[output])
instruction.submit(generate, inputs=[instruction], outputs=[output])
demo.queue(concurrency_count=1).launch(debug=True)
@srowen i finally made it work
is this message important?
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask
to obtain reliable results.
Setting pad_token_id
to eos_token_id
:0 for open-end generation.
def generate(instruction):
input_ids = tokenizer.encode(instruction, return_tensors="pt")
input_ids = input_ids.to(model.device) # Move input_ids to the same device as the model
generated_output = model.generate(input_ids, max_length=256)
dd = tokenizer.decode(generated_output[0])
return dd
finally fully make it work ty for help time to make tutorial video
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
i got it installed and even in the path weird
Hi, how did you fix this issue? I got the same CUDA problem. Thanks!
Ah, try
model_kwargs={"load_in_8bit": True}
. There is a different reason this pattern isn't quite working that I think Matt is fixing. You can also try something a little more unrolled like this:model_name = "databricks/dolly-v2-12b" tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left") model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True, trust_remote_code=True) end_key_token_id = tokenizer.encode("### End")[0] pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, \ pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id)
thank you so much
even though i have install torch latest version with cuda into my venv i got error after using your code
i am on windows
(dolly) F:\Dolly 2.0\dolly\Scripts>python "F:\Dolly 2.0\dolly.py" ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ bin F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')} warn(msg) CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine! F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! warn(msg) F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library... warn(msg) CUDA SETUP: Loading binary F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so... argument of type 'WindowsPath' is not iterable CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected. CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig. CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following: CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc Traceback (most recent call last): File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1093, in _get_module return importlib.import_module("." + module_name, self.__name__) File "C:\Python3108\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\bitsandbytes.py", line 10, in <module> import bitsandbytes as bnb File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module> from . import cuda_setup, utils, research File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module> from . import nn File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module> from .modules import LinearFP8Mixed, LinearFP8Global File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module> from bitsandbytes.optim import GlobalOptimManager File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module> from bitsandbytes.cextension import COMPILED_WITH_CUDA File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cextension.py", line 20, in <module> raise RuntimeError(''' RuntimeError: CUDA Setup failed despite GPU being available. Please run the following command to get more information: python -m bitsandbytes Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues The above exception was the direct cause of the following exception: Traceback (most recent call last): File "F:\Dolly 2.0\dolly.py", line 2, in <module> from transformers import * File "<frozen importlib._bootstrap>", line 1073, in _handle_fromlist File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1081, in __getattr__ value = self._get_module(name) File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1095, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.utils.bitsandbytes because of the following error (look up to see its traceback): CUDA Setup failed despite GPU being available. Please run the following command to get more information: python -m bitsandbytes Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues (dolly) F:\Dolly 2.0\dolly\Scripts>
Hi, I got the same error, too. It also told me to inspect the output of the command and see if I can locate CUDA libraries. Could you please tell me how you fixed your issue? Thanks in advance!
@srowen i made it work
i got just 1 final error that i need your help
To create a public link, set `share=True` in `launch()`. F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py:1387: UserWarning: Neither `max_length` nor `max_new_tokens` has been set, `max_length` will default to 20 (`self.config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation. warnings.warn( Input length of input_ids is 25, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`. F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py:1470: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`. warnings.warn( Traceback (most recent call last): File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\routes.py", line 395, in run_predict output = await app.get_blocks().process_api( File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 1196, in process_api data = self.postprocess_data(fn_index, result["prediction"], state) File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 1130, in postprocess_data prediction_value = block.postprocess(prediction_value) File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\components.py", line 5589, in postprocess unindented_y = inspect.cleandoc(y) File "C:\Python3108\lib\inspect.py", line 750, in cleandoc lines = doc.expandtabs().split('\n') AttributeError: 'list' object has no attribute 'expandtabs'
and this is the gradio code
import gradio as gr from transformers import * import torch theme = gr.themes.Monochrome( primary_hue="indigo", secondary_hue="blue", neutral_hue="slate", radius_size=gr.themes.sizes.radius_sm, font=[gr.themes.GoogleFont("Open Sans"), "ui-sans-serif", "system-ui", "sans-serif"], ) #instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto") #instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation") #instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.float16, trust_remote_code=True, device_map="auto",task="text-generation") #instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation", max_new_tokens=128) #instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-7b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation") #instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-7b", torch_dtype=torch.bfloat16, load_in_8bit=True, trust_remote_code=True, device_map="auto",task="text-generation",max_new_tokens=2048) model_name = "F:/Dolly 2.0/dolly-v2-12b" tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left",max_new_tokens=256) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True, trust_remote_code=True) end_key_token_id = tokenizer.encode("### End")[0] instruct_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer,pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id) def generate(instruction): return instruct_pipeline(instruction) examples = [ "Instead of making a peanut butter and jelly sandwich, what else could I combine peanut butter with in a sandwich? Give five ideas", "How do I make a campfire?", "Write me a tweet about the release of Dolly 2.0, a new LLM" ] def process_example(args): for x in generate(args): pass return x css = ".generating {visibility: hidden}" with gr.Blocks(theme=theme, analytics_enabled=False, css=css) as demo: with gr.Column(): gr.Markdown( """ ## Dolly 2.0 Dolly 2.0 is a 12B parameter language model based on the EleutherAI pythia model family and fine-tuned exclusively on a new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees. For more details, please refer to the [model card](https://huggingface.co/databricks/dolly-v2-12b) Type in the box below and click the button to generate answers to your most pressing questions! """ ) gr.HTML("<p>Check out SECourses for AI, Stable Diffusion, ML and Programming Related Full Free Courses, Tutorials and Guides : <a target='_blank' style='display:inline-block' href='https://www.youtube.com/@SECourses' alt='https://www.youtube.com/@SECourses'>https://www.youtube.com/@SECourses</a> </p>") with gr.Row(): with gr.Column(scale=3): instruction = gr.Textbox(placeholder="Enter your question here", label="Question", elem_id="q-input") with gr.Box(): gr.Markdown("**Answer**") output = gr.Markdown(elem_id="q-output") submit = gr.Button("Generate", variant="primary") gr.Examples( examples=examples, inputs=[instruction], cache_examples=False, fn=process_example, outputs=[output], ) submit.click(generate, inputs=[instruction], outputs=[output]) instruction.submit(generate, inputs=[instruction], outputs=[output]) demo.queue(concurrency_count=1).launch(debug=True)
What was your resolution to the bitsandbytes issue?
That is an error from gradio. You have some incompatibility between it and versions of its dependency. It is not related to this model
I finally made it work
What other parameters and options do we have?
instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation")