Open anonymous-program opened 3 months ago
I would like to understand which service needs to improve to have a chat bot with a faster response. Also, if there is file code to edit or change please mention it in the comment.
The speed of the response will be based on the speed of the various services. For the chat tab, that involves:
The ask tab only has the last three steps, so it may be faster.
You can use Azure Monitor to see the performance of those steps. See https://github.com/Azure-Samples/azure-search-openai-demo/?tab=readme-ov-file#monitoring-with-application-insights
You will likely find that the final step is the slowest, since LLMs require a lot of computing power. Azure does not give latency guarantees for the "Pay-as-you-go" subscriptions but does give them for PTU (Provisioned Thoroughput Units) so that is what many customers use. If that is beyond your budget, then you'll need to try other ways of reducing the time taken, like using the simpler "Ask" tab. If other LLM applications are faster, then they may be using a dedicated GPU for reduced latency.
Also note that you'll see different latency for different models (4 slower than 3.5) and for different regions, so you can experiment with that.
thank you for the guidance.
This issue is for a: (mark with an
x
)Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
OS and Version?
azd version?
Versions
Mention any other details that might be useful