Open SpecialYang opened 3 weeks ago
I'm not working on this specific doc. This is maintained by sglang team. However, you can probably use other distributed systems techniques in order to avoid using a centralized single node.
I do think with an efficient implementation you can probably scale up a single node pretty well.
From the doc https://docs.google.com/document/d/1cCqK3dh7ZR_rUPkcZT2cr0kLnAxv6_Sd-P1q37-3RNQ/edit?tab=t.0.
Is the router necessarily a centralized single node?
If not, how can multiple replicas of the router maintain consistent queues and approx trees?