arcee-ai / fastmlx

FastMLX is a high performance production ready API to host MLX models.
Other
208 stars 21 forks source link

Explore integration with exo? #17

Open stewartugelow opened 3 months ago

stewartugelow commented 3 months ago

Exo Labs just published a repository on how to create an MLX cluster across your Apple devices. (Including iOS apparently!)

https://github.com/exo-explore/exo

https://x.com/ac_crypto/status/1812949425465270340?s=46&t=3addBdiItmeUbNMMQJTR4w

Blaizzy commented 3 months ago

Thanks!

I have been thinking about that as well.

What are your thoughts?

Btw, distributed inference is on the roadmap:

https://x.com/prince_canuma/status/1812957802597417215?s=46

Blaizzy commented 3 months ago

I gave it a quick Look but will dig deeper throughout the week.

stewartugelow commented 3 months ago

It’s an area I’ve never explored before so the code doesn’t make much sense to me yet. I can’t imagine there are that many people who really need to run a llama 70b model locally, but if it accelerated token response rates it would be amazing!

(Higher bit quants might be very valuable, too.)

Blaizzy commented 2 months ago

Working a custom solution for it.

It will take a bit more time because my plate is pretty full and I don't have multiple powerful machines to test bigger models.

image