Can we display status of the answer generation?

FurkanGozukara commented 1 year ago

I finally made it work

What other parameters and options do we have?

instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation")

srowen commented 1 year ago

Anything in Hugging Face works. Here's one of several good writeups. https://huggingface.co/blog/how-to-generate This is not specific to this model.

FurkanGozukara commented 1 year ago

Anything in Hugging Face works. Here's one of several good writeups. https://huggingface.co/blog/how-to-generate This is not specific to this model.

i dont think work for example i tried load_in_8bit=True and it didnt work or all this failed too torch.uint8

srowen commented 1 year ago

I thought you were asking about generation options. Your changes aren't really making it work, just load on the CPU. Please see previous comments and read up on loading in 8 bit

But, it's easier to just load a smaller model. Try the 6.9B or 2.7B data set.

A lot of this isn't really about this model, but just about HF and LLMs, so there are many resources out there for you

FurkanGozukara commented 1 year ago

I thought you were asking about generation options. Your changes aren't really making it work, just load on the CPU. Please see previous comments and read up on loading in 8 bit

But, it's easier to just load a smaller model. Try the 6.9B or 2.7B data set.

A lot of this isn't really about this model, but just about HF and LLMs, so there are many resources out there for you

ty for answers. i am trying to prepare a guide for non-techy people

so this is the least vram and fastest we can get with 12b model?

can we display status of generation somehow? just waiting is bad and don't see nothing either in cmd too

instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.float16, trust_remote_code=True, device_map="auto",task="text-generation")

what is the max_new_tokens size?

any other token size parameter?

any way to speed up? i am using hf code to run atm

srowen commented 1 year ago

Well, RAM and speed are in tension. Fastest is going to be an A100. Least memory is something with 16GB, loading in 8bit. I don't believe the HF pipeline object shows a progress bar, no. It should be 10s of seconds on reasonable hardware. max_new_tokens is the max number of words, more or less, that it will try to generate. Turn it down to 128 or 64, that will be a little faster. I think it's already clear, you want bigger hardware to run 'fast'.

FurkanGozukara commented 1 year ago

Well, RAM and speed are in tension. Fastest is going to be an A100. Least memory is something with 16GB, loading in 8bit. I don't believe the HF pipeline object shows a progress bar, no. It should be 10s of seconds on reasonable hardware. max_new_tokens is the max number of words, more or less, that it will try to generate. Turn it down to 128 or 64, that will be a little faster. I think it's already clear, you want bigger hardware to run 'fast'.

How to load in 8bit? The code you linked didn't work thrown error

srowen commented 1 year ago

It does work; I'm using it now. load_in_8bit=True. That's another problem, and I posted responses to it, and there are other threads here on related issues. It's hard to debug your usage.

FurkanGozukara commented 1 year ago

It does work; I'm using it now. load_in_8bit=True. That's another problem, and I posted responses to it, and there are other threads here on related issues. It's hard to debug your usage.

instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-7b", torch_dtype=torch.bfloat16, load_in_8bit=True, trust_remote_code=True, device_map="auto",task="text-generation",max_new_tokens=128)

sorry i didnt give more info

it loads but when i try to get text it throws this error

(dolly) F:\Dolly 2.0\dolly\Scripts>python "F:\Dolly 2.0\demo.py"
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\routes.py", line 395, in run_predict
    output = await app.get_blocks().process_api(
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 1193, in process_api
    result = await self.call_function(
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 916, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "F:\Dolly 2.0\dolly\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "F:\Dolly 2.0\dolly\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "F:\Dolly 2.0\dolly\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "F:\Dolly 2.0\demo.py", line 21, in generate
    return instruct_pipeline(instruction)
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\pipelines\base.py", line 1074, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\pipelines\base.py", line 1081, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\pipelines\base.py", line 990, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "C:\Users\King/.cache\huggingface\modules\transformers_modules\local\instruct_pipeline.py", line 103, in _forward
    generated_sequence = self.model.generate(
  File "F:\Dolly 2.0\dolly\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py", line 1296, in generate
    self._validate_model_kwargs(model_kwargs.copy())
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py", line 993, in _validate_model_kwargs
    raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['load_in_8bit'] (note: typos in the generate arguments will also show up in this list)

srowen commented 1 year ago

Ah, try model_kwargs={"load_in_8bit": True}. There is a different reason this pattern isn't quite working that I think Matt is fixing.

You can also try something a little more unrolled like this:

model_name = "databricks/dolly-v2-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True, trust_remote_code=True)

end_key_token_id = tokenizer.encode("### End")[0]

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, \
    pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id)

FurkanGozukara commented 1 year ago

Ah, try model_kwargs={"load_in_8bit": True}. There is a different reason this pattern isn't quite working that I think Matt is fixing.

You can also try something a little more unrolled like this:
model_name = "databricks/dolly-v2-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True, trust_remote_code=True)

end_key_token_id = tokenizer.encode("### End")[0]

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, \
    pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id)

thank you so much

even though i have install torch latest version with cuda into my venv i got error after using your code

i am on windows

(dolly) F:\Dolly 2.0\dolly\Scripts>python "F:\Dolly 2.0\dolly.py"

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
  warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
  warn(msg)
CUDA SETUP: Loading binary F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1093, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "C:\Python3108\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\bitsandbytes.py", line 10, in <module>
    import bitsandbytes as bnb
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
    from . import nn
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError:
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "F:\Dolly 2.0\dolly.py", line 2, in <module>
    from transformers import *
  File "<frozen importlib._bootstrap>", line 1073, in _handle_fromlist
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1081, in __getattr__
    value = self._get_module(name)
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1095, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.utils.bitsandbytes because of the following error (look up to see its traceback):

        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

(dolly) F:\Dolly 2.0\dolly\Scripts>

srowen commented 1 year ago

You don't have all the needed CUDA libs installed or bitsandbytes doesn't support windows. Not sure which or both maybe.

FurkanGozukara commented 1 year ago

CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.

i got it installed and even in the path weird

FurkanGozukara commented 1 year ago

@srowen i made it work

i got just 1 final error that i need your help

To create a public link, set `share=True` in `launch()`.
F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py:1387: UserWarning: Neither `max_length` nor `max_new_tokens` has been set, `max_length` will default to 20 (`self.config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
Input length of input_ids is 25, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py:1470: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
Traceback (most recent call last):
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\routes.py", line 395, in run_predict
    output = await app.get_blocks().process_api(
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 1196, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 1130, in postprocess_data
    prediction_value = block.postprocess(prediction_value)
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\components.py", line 5589, in postprocess
    unindented_y = inspect.cleandoc(y)
  File "C:\Python3108\lib\inspect.py", line 750, in cleandoc
    lines = doc.expandtabs().split('\n')
AttributeError: 'list' object has no attribute 'expandtabs'

and this is the gradio code

import gradio as gr
from transformers import *
import torch

theme = gr.themes.Monochrome(
    primary_hue="indigo",
    secondary_hue="blue",
    neutral_hue="slate",
    radius_size=gr.themes.sizes.radius_sm,
    font=[gr.themes.GoogleFont("Open Sans"), "ui-sans-serif", "system-ui", "sans-serif"],
)

#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.float16, trust_remote_code=True, device_map="auto",task="text-generation")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation", max_new_tokens=128)
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-7b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-7b", torch_dtype=torch.bfloat16, load_in_8bit=True, trust_remote_code=True, device_map="auto",task="text-generation",max_new_tokens=2048)

model_name = "F:/Dolly 2.0/dolly-v2-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left",max_new_tokens=256)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True, trust_remote_code=True)

end_key_token_id = tokenizer.encode("### End")[0]

instruct_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer,pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id)

def generate(instruction): 
    return instruct_pipeline(instruction)

examples = [
    "Instead of making a peanut butter and jelly sandwich, what else could I combine peanut butter with in a sandwich? Give five ideas",
    "How do I make a campfire?",
    "Write me a tweet about the release of Dolly 2.0, a new LLM"
]

def process_example(args):
    for x in generate(args):
        pass
    return x

css = ".generating {visibility: hidden}"

with gr.Blocks(theme=theme, analytics_enabled=False, css=css) as demo:
    with gr.Column():
        gr.Markdown(
            """ ## Dolly 2.0
            Dolly 2.0 is a 12B parameter language model based on the EleutherAI pythia model family and fine-tuned exclusively on a new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees. For more details, please refer to the [model card](https://huggingface.co/databricks/dolly-v2-12b)

            Type in the box below and click the button to generate answers to your most pressing questions!

      """
        )
        gr.HTML("<p>Check out SECourses for AI, Stable Diffusion, ML and Programming Related Full Free Courses, Tutorials and Guides  : <a target='_blank' style='display:inline-block' href='https://www.youtube.com/@SECourses' alt='https://www.youtube.com/@SECourses'>https://www.youtube.com/@SECourses</a> </p>")

        with gr.Row():
            with gr.Column(scale=3):
                instruction = gr.Textbox(placeholder="Enter your question here", label="Question", elem_id="q-input")

                with gr.Box():
                    gr.Markdown("**Answer**")
                    output = gr.Markdown(elem_id="q-output")
                submit = gr.Button("Generate", variant="primary")
                gr.Examples(
                    examples=examples,
                    inputs=[instruction],
                    cache_examples=False,
                    fn=process_example,
                    outputs=[output],
                )

    submit.click(generate, inputs=[instruction], outputs=[output])
    instruction.submit(generate, inputs=[instruction], outputs=[output])

demo.queue(concurrency_count=1).launch(debug=True)

FurkanGozukara commented 1 year ago

@srowen i finally made it work

is this message important?

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:0 for open-end generation.

def generate(instruction): 
    input_ids = tokenizer.encode(instruction, return_tensors="pt")
    input_ids = input_ids.to(model.device)  # Move input_ids to the same device as the model
    generated_output = model.generate(input_ids, max_length=256)
    dd = tokenizer.decode(generated_output[0])
    return dd

FurkanGozukara commented 1 year ago

finally fully make it work ty for help time to make tutorial video

JingleiY commented 1 year ago

CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.

i got it installed and even in the path weird

Hi, how did you fix this issue? I got the same CUDA problem. Thanks!

EvanTheBoy commented 1 year ago

Ah, try model_kwargs={"load_in_8bit": True}. There is a different reason this pattern isn't quite working that I think Matt is fixing. You can also try something a little more unrolled like this:
model_name = "databricks/dolly-v2-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True, trust_remote_code=True)

end_key_token_id = tokenizer.encode("### End")[0]

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, \
    pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id)

thank you so much

even though i have install torch latest version with cuda into my venv i got error after using your code

i am on windows

(dolly) F:\Dolly 2.0\dolly\Scripts>python "F:\Dolly 2.0\dolly.py"

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
  warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cuda_setup\main.py:145: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
  warn(msg)
CUDA SETUP: Loading binary F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1093, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "C:\Python3108\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\bitsandbytes.py", line 10, in <module>
    import bitsandbytes as bnb
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
    from . import nn
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "F:\Dolly 2.0\dolly\lib\site-packages\bitsandbytes\cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError:
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "F:\Dolly 2.0\dolly.py", line 2, in <module>
    from transformers import *
  File "<frozen importlib._bootstrap>", line 1073, in _handle_fromlist
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1081, in __getattr__
    value = self._get_module(name)
  File "F:\Dolly 2.0\dolly\lib\site-packages\transformers\utils\import_utils.py", line 1095, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.utils.bitsandbytes because of the following error (look up to see its traceback):

        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

(dolly) F:\Dolly 2.0\dolly\Scripts>

Hi, I got the same error, too. It also told me to inspect the output of the command and see if I can locate CUDA libraries. Could you please tell me how you fixed your issue? Thanks in advance!

datalifenyc commented 1 year ago

@srowen i made it work

i got just 1 final error that i need your help

To create a public link, set `share=True` in `launch()`.
F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py:1387: UserWarning: Neither `max_length` nor `max_new_tokens` has been set, `max_length` will default to 20 (`self.config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
Input length of input_ids is 25, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
F:\Dolly 2.0\dolly\lib\site-packages\transformers\generation\utils.py:1470: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
Traceback (most recent call last):
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\routes.py", line 395, in run_predict
    output = await app.get_blocks().process_api(
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 1196, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\blocks.py", line 1130, in postprocess_data
    prediction_value = block.postprocess(prediction_value)
  File "F:\Dolly 2.0\dolly\lib\site-packages\gradio\components.py", line 5589, in postprocess
    unindented_y = inspect.cleandoc(y)
  File "C:\Python3108\lib\inspect.py", line 750, in cleandoc
    lines = doc.expandtabs().split('\n')
AttributeError: 'list' object has no attribute 'expandtabs'

and this is the gradio code

import gradio as gr
from transformers import *
import torch

theme = gr.themes.Monochrome(
    primary_hue="indigo",
    secondary_hue="blue",
    neutral_hue="slate",
    radius_size=gr.themes.sizes.radius_sm,
    font=[gr.themes.GoogleFont("Open Sans"), "ui-sans-serif", "system-ui", "sans-serif"],
)

#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.float16, trust_remote_code=True, device_map="auto",task="text-generation")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation", max_new_tokens=128)
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-7b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",task="text-generation")
#instruct_pipeline = pipeline(model="F:/Dolly 2.0/dolly-v2-7b", torch_dtype=torch.bfloat16, load_in_8bit=True, trust_remote_code=True, device_map="auto",task="text-generation",max_new_tokens=2048)

model_name = "F:/Dolly 2.0/dolly-v2-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left",max_new_tokens=256)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True, trust_remote_code=True)

end_key_token_id = tokenizer.encode("### End")[0]

instruct_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer,pad_token_id=tokenizer.pad_token_id, eos_token_id=end_key_token_id)

def generate(instruction): 
    return instruct_pipeline(instruction)

examples = [
    "Instead of making a peanut butter and jelly sandwich, what else could I combine peanut butter with in a sandwich? Give five ideas",
    "How do I make a campfire?",
    "Write me a tweet about the release of Dolly 2.0, a new LLM"
]

def process_example(args):
    for x in generate(args):
        pass
    return x

css = ".generating {visibility: hidden}"

with gr.Blocks(theme=theme, analytics_enabled=False, css=css) as demo:
    with gr.Column():
        gr.Markdown(
            """ ## Dolly 2.0
            Dolly 2.0 is a 12B parameter language model based on the EleutherAI pythia model family and fine-tuned exclusively on a new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees. For more details, please refer to the [model card](https://huggingface.co/databricks/dolly-v2-12b)

            Type in the box below and click the button to generate answers to your most pressing questions!

      """
        )
        gr.HTML("<p>Check out SECourses for AI, Stable Diffusion, ML and Programming Related Full Free Courses, Tutorials and Guides  : <a target='_blank' style='display:inline-block' href='https://www.youtube.com/@SECourses' alt='https://www.youtube.com/@SECourses'>https://www.youtube.com/@SECourses</a> </p>")

        with gr.Row():
            with gr.Column(scale=3):
                instruction = gr.Textbox(placeholder="Enter your question here", label="Question", elem_id="q-input")

                with gr.Box():
                    gr.Markdown("**Answer**")
                    output = gr.Markdown(elem_id="q-output")
                submit = gr.Button("Generate", variant="primary")
                gr.Examples(
                    examples=examples,
                    inputs=[instruction],
                    cache_examples=False,
                    fn=process_example,
                    outputs=[output],
                )

    submit.click(generate, inputs=[instruction], outputs=[output])
    instruction.submit(generate, inputs=[instruction], outputs=[output])

demo.queue(concurrency_count=1).launch(debug=True)

What was your resolution to the bitsandbytes issue?

srowen commented 1 year ago

That is an error from gradio. You have some incompatibility between it and versions of its dependency. It is not related to this model

databrickslabs / dolly

Can we display status of the answer generation? #74