InftyAI / llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Apache License 2.0
31 stars 10 forks source link

Use rust instead of python when downloading model weights #38

Closed kerthcet closed 3 months ago

kerthcet commented 4 months ago

Because of the huge size of model weights and the GIL lock of python, let's move to rust instead for performance save.

kerthcet commented 4 months ago

/kind feature /priority important-soon /assign /milestone v0.1.0

kerthcet commented 3 months ago

Some libs like https://github.com/coreweave/tensorizer/tree/main has optimized methods to download models, but written in python, I think python is the most popular language in AI world, so we may be careful about this approach.

kerthcet commented 3 months ago

Considering the hf-hub written in rust download the model in synchronous ways, the benefit is small because we can also use multi-threads in python. Let's keep using python instead at this moment.

kerthcet commented 3 months ago

What we found is usually this download rate is limited by the NAT of cloud vendor, like 200Mbps, which equals to 25MB/s, so the optimization is somehow useless.

kerthcet commented 3 months ago

/remove milestone

kerthcet commented 3 months ago

/milestone clear /close