FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

CPU and M1/M2 GPU platform support #80

Open xiezhq-hermann opened 1 year ago

xiezhq-hermann commented 1 year ago

Reopen https://github.com/FMInference/FlexGen/pull/71 which was closed by mistake. Minimal modification to extend FlexGen to CPU and M1/M2 GPU platforms.