OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.28k stars 287 forks source link

Support for "mistralai/Mistral-7B-Instruct-v0.1" model #1501

Closed Matthieu-Tinycoaching closed 10 months ago

Matthieu-Tinycoaching commented 1 year ago

Hi,

Would it be possible to add support for "mistralai/Mistral-7B-Instruct-v0.1" model?

kdcyberdude commented 7 months ago
File "/home/kd/anaconda3/envs/hf2/lib/python3.12/site-packages/ctranslate2/converters/transformers.py", line 1470, in set_decoder
    print(layer.self_attn.q_proj.qweight.shape)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kd/anaconda3/envs/hf2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1688, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Linear' object has no attribute 'qweight'. Did you mean: 'weight'?

My bad, While trying to convert a 4-bit model, I made changes to the library code which resulted in an error. However, I have reverted the changes, and now it's working.

kdcyberdude commented 7 months ago

OpenNMT-py demonstrates impressive performance, achieving ~3400 tokens/s with a batch size of 120 on 4090. However, I'm encountering an issue with running inference on my converted ONMT model, which results in the error detailed in this GitHub issue I opened: https://github.com/OpenNMT/OpenNMT-py/issues/2562.

Could anyone offer insights or point me to relevant documentation to resolve this issue? cc: @vince62s

silvacarl2 commented 7 months ago

can you just convert like this?

ct2-transformers-converter --model berkeley-nest/Starling-LM-7B-alpha --output_dir ./berkeley-nest/Starling-LM-7B-alpha-ct2/

silvacarl2 commented 7 months ago

actually it results in this error message which I think I have seen in this git repo before: Traceback (most recent call last): File "/home/silvacarl/.local/bin/ct2-transformers-converter", line 8, in sys.exit(main()) File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 2008, in main converter.convert_from_args(args) File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args return self.convert( File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/converter.py", line 97, in convert model_spec.validate() File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/specs/model_spec.py", line 590, in validate raise ValueError( ValueError: Vocabulary has size 32003 but the model expected a vocabulary of size 32002

kdcyberdude commented 7 months ago

Can you just convert like this?

ct2-transformers-converter --model berkeley-nest/Starling-LM-7B-alpha --output_dir ./berkeley-nest/Starling-LM-7B-alpha-ct2/

@silvacarl2 I am able to convert model CT2 with both quantization i.e. int8_bfloat16 and int8, though both are taking exactly the same space on disk and have the same inference speed.

validate raise ValueError( ValueError: Vocabulary has size 32003 but the model expected a vocabulary of size 32002

This is fixed, I have seen the solution in some other issue, which is to truncate the last token. In this case, it's <sep>.

What I am now trying to do is, to use the AWQ quantized model with OpenNMT-py framework, which is faster than CT2. I am facing a problem doing inference of my converted model.

silvacarl2 commented 7 months ago

I dont think CT2 can convert a model that is alredy in AWQ format. try vllm to do inferencing.

kdcyberdude commented 7 months ago

I dont think CT2 can convert a model that is alredy in AWQ format. try vllm to do inferencing.

Right!! though I am talking about this - https://github.com/OpenNMT/OpenNMT-py/issues/2562