CPU and M1/M2 GPU platform support

FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

Apache License 2.0

9.21k stars 547 forks source link

Open xiezhq-hermann opened 1 year ago

xiezhq-hermann commented 1 year ago

Reopen https://github.com/FMInference/FlexGen/pull/71 which was closed by mistake. Minimal modification to extend FlexGen to CPU and M1/M2 GPU platforms.