AetherCortex / Llama-X

Open Academic Research on Improving LLaMA to SOTA LLM
Apache License 2.0
1.59k stars 101 forks source link

Need optimization for mps #10

Open yxKryptonite opened 1 year ago

yxKryptonite commented 1 year ago

Reproduction info:

The inference speed is very slow on macOS machine with mps, which needs further optimization.