Open nopepper opened 4 months ago
Hi @nopepper! The only thing which is hard(er) to do is continuous batching. The prefix caching and all other features will work, but for continuous batching it is hard due to the GIL.
We will be releasing an async Python API soon (probably during the coming week, perhaps we can leave this issue open), and the Python API itself will be reworked internally for version 0.2.0 for a smoother experience.
@nopepper - beginning work on the async API!
Hello, sorry if this is not an issue per se, but I wasn't sure where else to put it. I couldn't find any info on whether all the performance features (including continuous batching, caching, etc.) are supported through the Python API. It seems like you can only call the Runner synchronously and with one request at a time. Am I missing something, or is batching only supported when using the server? Thanks.