akshata29 / entaoai

Chat and Ask on your own data. Accelerator to quickly upload your own enterprise data and use OpenAI services to chat to that uploaded data and ask questions
MIT License
841 stars 254 forks source link

Long Response Time / high latency #23

Closed MediGenie closed 1 year ago

MediGenie commented 1 year ago

So I am based in Seoul, South Korea. I know Azure OpenAI/OpenAI resources are only based in US/Europe and not Asia yet, but for people in US East/US Central are you guys experiencing high latency or long time to respond?

For me i brought Azure resources that i can and Pinecone to Korea Central, but I am wondering @akshata29 what are some other things or ideas you can think of that I can do to speed things up on my end? Not taking cost into a factor. I just want chatpdf to run fast like https://chatpdf.com/

Thank you!!

akshata29 commented 1 year ago

At times I do see some latency issue in US, but most of the time have not run into the issue. Depending on your use-case, you can implement the Cache mechanism (I have that in list to implement it) and/or build your own KB to not call OpenAI all the time based on the KB you are building on your document/data

MediGenie commented 1 year ago

Thank you SO much for your response. I have referenced the nice Azure Architecture diagram, but could you be more specific about what you mean when you say cache mechanism? Are you saying when i ask a question on chat...Cog Search(US East) searches on Pinecone(Korea Central) and then returns the findins back to US East which the search results are added with the question and then Cog Search returns the results to my computer in Korea Central? What you are saying is that its good to cache the documents in US East? It would be wonderful if you can explain in detail. :)

akshata29 commented 1 year ago

Implemented the Cache pattern (using Cognitive search) in the repo. Moreover you can now also implement pattern as per the reference architecture we created at https://learn.microsoft.com/en-us/azure/architecture/example-scenario/ai/log-monitor-azure-openai to load-balance against multiple AOAI instance. Lastly for enterprise customers, we do allow PTU (Provisioned Throughput) model to get the latency and performance metrics.