Closed Matthieu-Tinycoaching closed 10 months ago
File "/home/kd/anaconda3/envs/hf2/lib/python3.12/site-packages/ctranslate2/converters/transformers.py", line 1470, in set_decoder print(layer.self_attn.q_proj.qweight.shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kd/anaconda3/envs/hf2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1688, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'Linear' object has no attribute 'qweight'. Did you mean: 'weight'?
My bad, While trying to convert a 4-bit model, I made changes to the library code which resulted in an error. However, I have reverted the changes, and now it's working.
OpenNMT-py demonstrates impressive performance, achieving ~3400 tokens/s with a batch size of 120 on 4090. However, I'm encountering an issue with running inference on my converted ONMT model, which results in the error detailed in this GitHub issue I opened: https://github.com/OpenNMT/OpenNMT-py/issues/2562.
Could anyone offer insights or point me to relevant documentation to resolve this issue? cc: @vince62s
can you just convert like this?
ct2-transformers-converter --model berkeley-nest/Starling-LM-7B-alpha --output_dir ./berkeley-nest/Starling-LM-7B-alpha-ct2/
actually it results in this error message which I think I have seen in this git repo before:
Traceback (most recent call last):
File "/home/silvacarl/.local/bin/ct2-transformers-converter", line 8, in
Can you just convert like this?
ct2-transformers-converter --model berkeley-nest/Starling-LM-7B-alpha --output_dir ./berkeley-nest/Starling-LM-7B-alpha-ct2/
@silvacarl2 I am able to convert model CT2 with both quantization i.e. int8_bfloat16 and int8, though both are taking exactly the same space on disk and have the same inference speed.
validate raise ValueError( ValueError: Vocabulary has size 32003 but the model expected a vocabulary of size 32002
This is fixed, I have seen the solution in some other issue, which is to truncate the last token. In this case, it's <sep>
.
What I am now trying to do is, to use the AWQ quantized model with OpenNMT-py framework, which is faster than CT2. I am facing a problem doing inference of my converted model.
I dont think CT2 can convert a model that is alredy in AWQ format. try vllm to do inferencing.
I dont think CT2 can convert a model that is alredy in AWQ format. try vllm to do inferencing.
Right!! though I am talking about this - https://github.com/OpenNMT/OpenNMT-py/issues/2562
Hi,
Would it be possible to add support for "mistralai/Mistral-7B-Instruct-v0.1" model?