Closed JoeZijunZhou closed 4 months ago
after Vivian added the AOT support, could we use that to identify if the replica has warmed up?
after Vivian added the AOT support, could we use that to identify if the replica has warmed up?
Yes, that would be the ideal signal to resolve this issue.
Instead of sleep x seconds, can you wait until all the warm up request return all the tokens?
There is a case when the warmup requests done before server warmup complete. Vivian is working on getting the server warmup complete signal from engine. This is a temp workaround.
I think 10 seconds is too long - 2-5 perfectly worked for me.