Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
5.93k stars 4.08k forks source link

GPT4 is too slow #1187

Open asliiNorderia opened 8 months ago

asliiNorderia commented 8 months ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [X] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

In Infra -> main.bicep I changed the model to GPT-4 and version to '1106-Preview'.

Any log messages given by the failure

Expected/desired behavior

Expectation is get the performance close to the performance with GPT-3.5.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

azd version?

run azd version and copy paste here.

Versions

Mention any other details that might be useful

After changing theGPT Model from GPT3.5 to GPT4, the performance is decreased considerably.


Thanks! We'll be in touch soon.

crissins commented 7 months ago

I think this is pretty normal as in the chatGPT using the gpt 4 model proves to be way slower due to the fact that it is a heavier to infer model

pamelafox commented 7 months ago

That's correct, GPT-4 is expected to be slower.

In addition, I assume you're using Pay-as-you-go pricing tier, which doesn't come with any latency guarantees. For latency assurance, Azure recommends PTUs: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput Those can be expensive, however, since you need to pre-reserve a bunch of capacity.

The other approach I've heard is to use openai.com OpenAI instead of Azure OpenAI. That may be slightly faster due to the lack of the content safety filter service and other protections (but then you lose those protections, plus other aspects of Azure reliability and security).

You could also try some prompt engineering or few-shot prompting to improve the quality of the responses for gpt-3.5.

You can also try GPT-4 across different regions to measure what latency you see, and see if any regions are responding faster than others.