Closed ParthMandaliya closed 8 months ago
@ParthMandaliya I had the same issue here https://github.com/BatsResearch/bonito/issues/8 where I tried to use Google Colab and I got this error:
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half.
@nihalnayak Just a quick update:
I tried to use Bonito on my available cluster instead of Google Colab T4 GPU. Again, the error message is the following:
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-SXM2-32GB GPU has compute capability 7.0.
Thank you again for your help!
You can pass dtype parameter to Bonito class as below:
from bonito import Bonito
obj = Bonito("<model">, dtype="float16")
@ParthMandaliya Thank you! It worked!
@nihalnayak @alexandreteles I tried with alexandreteles/bonito-v1-awq quantized model, but it needs at least CUDA compatibility of 7.5, GPU I have supports 7.0. When I tried the same on Google Colab, I get the following error:
from bonito import Bonito
# Initialize the Bonito model
bonito = Bonito("alexandreteles/bonito-v1-awq", dtype="float16")
<OUTPUT>
...
/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py in get_model(model_config, device_config, **kwargs)
84 else:
85 # Load the weights from the cached or downloaded files.
---> 86 model.load_weights(model_config.model, model_config.download_dir,
87 model_config.load_format, model_config.revision)
88 return model.eval()
/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py in load_weights(self, model_name_or_path, cache_dir, load_format, revision)
389 weight_loader = getattr(param, "weight_loader",
390 default_weight_loader)
--> 391 weight_loader(param, loaded_weight)
/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/linear.py in weight_loader(self, param, loaded_weight)
550 shard_size = param_data.shape[input_dim]
551 start_idx = tp_rank * shard_size
--> 552 loaded_weight = loaded_weight.narrow(input_dim, start_idx,
553 shard_size)
554 assert param_data.shape == loaded_weight.shape
RuntimeError: start (0) + length (14336) exceeds dimension size (4096).
Please let me know if I am doing something wrong, if you need additional steps to make this quantized model compatible, or if you have another model available that I can try.
Hi @ParthMandaliya We have created a tutorial to run the quantized version of Bonito: https://colab.research.google.com/drive/1tfAqUsFaLWLyzhnd1smLMGcDXSzOwp9r?usp=sharing
Hope this helps!
Closing this issue. Reopen if you are still facing issues.
I have been working with limited resources, I was wondering if you have a smaller model available. Something which can be used in Google Colab (15GB T4 GPU)?
I checked your huggingface page, but it looked like you have only one available over there and that is Bonito-V1.