Closed kadirnar closed 3 months ago
In theory, eetq supports all models supported by transformers, you can try this:
from transformers import AutoModelForCausalLM, EetqConfig path = "/path_to_model" quantization_config = EetqConfig("int8") model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", quantization_config=quantization_config)
I tested it and it works. What can I do to optimize further? Have you tested with Torch.compile?
In theory, eetq supports all models supported by transformers, you can try this: