Open atisharma opened 1 year ago
It's not clear from the documentation how to split VRAM over multiple GPUs with exllama.
For future readers, it can be done by adding the following line in model_definitions.py (e.g. to split a 70B model over two cards):
model_definitions.py
auto_map=[17.5, 22],
It's not clear from the documentation how to split VRAM over multiple GPUs with exllama.