if user use object stores, they can use fluid as distributed caching system
if user use oci images, they can use dragonfly for p2p accelerating
However, there are 2 gaps here:
what about user download model weights from model hub, we should only download once and then using the cache, but we can't achieve this today. One way is download the model weights to the file system or cache system
However this bring another problem, what if people don't have these two additional components, we should still have ways to accelerate the model loading by default.
Why is this needed:
Minimum the configurations but still enjoy the accelerating.
Completion requirements:
This enhancement requires the following artifacts:
[ ] Design doc
[ ] API change
[ ] Docs update
The artifacts should be linked in subsequent comments.
What would you like to be added:
Generally,
However, there are 2 gaps here:
Why is this needed:
Minimum the configurations but still enjoy the accelerating.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.