We want to serve LLMs from LitGPT using LitServe, however the current model initialization step leaks a lot of complexity at the user code. Also, couldn't find a generator function to stream responses. So had to bring the generate function on user code side.
A simple API to do this kinds of thing in a few lines of code would be really appreciated!
We want to serve LLMs from LitGPT using LitServe, however the current model initialization step leaks a lot of complexity at the user code. Also, couldn't find a generator function to stream responses. So had to bring the generate function on user code side.
A simple API to do this kinds of thing in a few lines of code would be really appreciated!
cc: @lantiga