epolewski / EricLLM

A fast batching API to serve LLM models
MIT License
172 stars 13 forks source link

current_seq_len and gpu_balance #10

Open chardog opened 1 month ago

chardog commented 1 month ago

While loading mixtral I get "AssertionError: Insufficient space in device allocation".

Command I used "python ericLLM.py --model ./models/mistralai_Mixtral-8x7B-Instruct-v0.1 --gpu_split 24,24,24,24,24 --max_prompts 8 --num_workers 1 --gpu_balance"

I removed gpu_balance and then it loads layers across 4 out of the 5 24gb gpus but then I get a different error:

"AttributeError: 'list' object has no attribute 'current_seq_len'"

chardog commented 1 month ago

It also doesn't appear to support quantization. Is that right?