Accelerate model loading

kerthcet commented 3 weeks ago

What would you like to be added:

Generally,

However, there are 2 gaps here:

what about user download model weights from model hub, we should only download once and then using the cache, but we can't achieve this today. One way is download the model weights to the file system or cache system
However this bring another problem, what if people don't have these two additional components, we should still have ways to accelerate the model loading by default.

Why is this needed:

Minimum the configurations but still enjoy the accelerating.

Completion requirements:

This enhancement requires the following artifacts:

The artifacts should be linked in subsequent comments.

kerthcet commented 3 weeks ago

/kind feature

kerthcet commented 3 weeks ago

/priority important-soon

InftyAI / llmaz