load_in_8bit Dolly demo for smaller GPUs code incorrect

databricks-demos / dbdemos

Demos to implement your Databricks Lakehouse

Other

255 stars 80 forks source link

load_in_8bit Dolly demo for smaller GPUs code incorrect #30

Closed tj-cycyota closed 1 year ago

tj-cycyota commented 1 year ago

In the Dolly demo, notebook 03-Q&A-prompt-engineering-for-dolly, there is sample code provided (originally commented out) that looks like:

# Note: if you use dolly 12B or smaller model but a GPU with less than 24GB RAM, use 8bit. This requires %pip install bitsandbytes # instruct_pipeline = pipeline(model=model_name, load_in_8bit=True, trust_remote_code=True, device_map="auto")

However, the correct way to pass the load_in_8bit param according to the Databricks Dolly Docs is as: instruct_pipeline = pipeline(model=model_name, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", return_full_text=True, max_new_tokens=256, top_p=0.95, top_k=50, model_kwargs={'load_in_8bit': True})

QuentinAmbard commented 1 year ago

Isn't load_in_8bit=True doing that? It's sent to BitsAndBytesConfig and adds it to the model no?

tj-cycyota commented 1 year ago

Not in my testing. The only way it works (e.g. actually load the model on a smaller GPU) is with setting model_kwargs={'load_in_8bit': True}

QuentinAmbard commented 1 year ago

so change to this right?

# Note: if you use dolly 12B or smaller model but a GPU with less than 24GB RAM, use 8bit. This requires %pip install bitsandbytes
  # instruct_pipeline = pipeline(model=model_name, load_in_8bit=True, trust_remote_code=True, device_map="auto", model_kwargs={'load_in_8bit': True})

tj-cycyota commented 1 year ago

You have load_in_8bit=True in there twice. It should be this, with that param in model_kwargs:

# Note: if you use dolly 12B or smaller model but a GPU with less than 24GB RAM, use 8bit. This requires %pip install bitsandbytes
# instruct_pipeline = pipeline(model=model_name, trust_remote_code=True, device_map="auto", model_kwargs={'load_in_8bit': True})

QuentinAmbard commented 1 year ago

Thanks, I'm adding it in the next release