Is there any smaller model available?

ParthMandaliya commented 8 months ago

I have been working with limited resources, I was wondering if you have a smaller model available. Something which can be used in Google Colab (15GB T4 GPU)?

I checked your huggingface page, but it looked like you have only one available over there and that is Bonito-V1.

MatteoRiva95 commented 8 months ago

@ParthMandaliya I had the same issue here https://github.com/BatsResearch/bonito/issues/8 where I tried to use Google Colab and I got this error:

ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half.

@nihalnayak Just a quick update:

I tried to use Bonito on my available cluster instead of Google Colab T4 GPU. Again, the error message is the following:

ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-SXM2-32GB GPU has compute capability 7.0.

Thank you again for your help!

ParthMandaliya commented 8 months ago

You can pass dtype parameter to Bonito class as below:

from bonito import Bonito
obj = Bonito("<model">, dtype="float16")

MatteoRiva95 commented 8 months ago

@ParthMandaliya Thank you! It worked!

ParthMandaliya commented 8 months ago

@nihalnayak @alexandreteles I tried with alexandreteles/bonito-v1-awq quantized model, but it needs at least CUDA compatibility of 7.5, GPU I have supports 7.0. When I tried the same on Google Colab, I get the following error:

from bonito import Bonito

# Initialize the Bonito model
bonito = Bonito("alexandreteles/bonito-v1-awq", dtype="float16")

<OUTPUT>
...
/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py in get_model(model_config, device_config, **kwargs)
     84         else:
     85             # Load the weights from the cached or downloaded files.
---> 86             model.load_weights(model_config.model, model_config.download_dir,
     87                                model_config.load_format, model_config.revision)
     88     return model.eval()

/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py in load_weights(self, model_name_or_path, cache_dir, load_format, revision)
    389                 weight_loader = getattr(param, "weight_loader",
    390                                         default_weight_loader)
--> 391                 weight_loader(param, loaded_weight)

/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/linear.py in weight_loader(self, param, loaded_weight)
    550             shard_size = param_data.shape[input_dim]
    551             start_idx = tp_rank * shard_size
--> 552             loaded_weight = loaded_weight.narrow(input_dim, start_idx,
    553                                                  shard_size)
    554         assert param_data.shape == loaded_weight.shape

RuntimeError: start (0) + length (14336) exceeds dimension size (4096).

Please let me know if I am doing something wrong, if you need additional steps to make this quantized model compatible, or if you have another model available that I can try.

nihalnayak commented 8 months ago

Hi @ParthMandaliya We have created a tutorial to run the quantized version of Bonito: https://colab.research.google.com/drive/1tfAqUsFaLWLyzhnd1smLMGcDXSzOwp9r?usp=sharing

Hope this helps!

nihalnayak commented 8 months ago

Closing this issue. Reopen if you are still facing issues.

BatsResearch / bonito

Is there any smaller model available? #9