WukLab / preble

Stateful LLM Serving
Apache License 2.0
38 stars 6 forks source link

Add Load based Memory Eviction #44

Closed vikranth22446 closed 7 months ago

vikranth22446 commented 7 months ago

Consider using a prioritiy queue for each node. Then evict based on that that prioritiy queue. This should handle memory much better in terms of sharing