Open igforce opened 1 year ago
The other repo includes a lot more features around data ingestion, but not as much flexibility around the RAG (Retrieval Augmented Generation) approach, since it uses the dataSources parameter of the ChatGPT Completions API instead of the manual chaining approaches used here. If the other repo is sufficient for your needs, then you can go with that. If you need some of the flexibility of the approaches in this repo, then switch to this one. I think it may be possible to use the data ingestion from that repo with this one, but I haven't yet tried it. The key is that it needs to create an index that is compatible with our search queries, and I think it does.
@pamelafox Thanks on the quick turn around in terms of the answer. I just started to review in terms of the content of the other repo. I just feel like they seem to have complementary strength. One on the ingestion process, one on the prompt. Wouldn't it make sense to have just one repo? It has been mentioned as well that the other repo is more production ready than this one.
Extract from the other repo Readme
*Have you seen [ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search demo](https://github.com/Azure-Samples/azure-search-openai-demo)? If you would like to play with prompts, understanding RAG pattern different implementation approaches and similar demo tasks, take a look at that repo. Note that the demo in that repo should not be used in Proof of Concepts (POCs) that later will be adapted for production environments. Instead, consider the use of this repo and follow the best practices outlines in this repo.
I think "production ready" has different meanings. This repo has had quite a lot of usage already in the wild, since it was the first RAG ACS+AOAI repo available, so we've applied a lot of learnings from customers here. We're also using best practices such as concurrency (for performance) and managed identity (for security). The accelerator repo has better support for ingesting multiple data types, and it uses the dataSources parameter which has also been worked on quite a bit internally. If your data types work with that ingestion and dataSources parameter approach, then that could be a good fit. But if you need more flexibility to the prompt or other features used in this repo, then this might be a better fit.
The goal is to have one repo that's more on the bleeding edge (this one) and the other that's more stable, but we're not at that point yet. Hopefully our ingestion story will improve for this repo so that your decision making process is easier. Sorry for the confusion!
Track this issue for ingestion improvements in this repo
Note: The README in the -accelerator repo is now updated, so it no longer says that it's not appropriate for use in productions scenarios.
This is great info @pamelafox With the recent changes to this repo (esp. moving to Quart, and the introduction of streaming) the Python backend has become more complex (e.g. how you had to deal with followup questions) and takes a bit more to understand. Seems one of the reasons is because of the developer tools in the frontend and needing to pass those parameters as input (and the thought process back in response) as well. This repo has a lot of flexibility, but more challenging to change and debug.
The simpler frontend of the chat-with
repo seems to have streamlined the interaction with the backend a lot.
A question -- is the dataSources
parameter on the API always used?
Also I thought that was part of the Extensions API and not part of the normal Completions API?
With the recent changes to this repo (esp. moving to Quart, and the introduction of streaming) the Python backend has become more complex (e.g. how you had to deal with followup questions) and takes a bit more to understand. Seems one of the reasons is because of the developer tools in the frontend and needing to pass those parameters as input (and the thought process back in response) as well. This repo has a lot of flexibility, but more challenging to change and debug.
Some extra background on this, regarding the usage of Quart in the backend. When running load tests (see the locustfile in the repo), because the RAG end-to-end transaction takes a long time (5+ seconds) it locks the backend web worker threads when used with Flask and Sync APIs. Python with Gunicorn doesn't scale very well when you have long-running locked process in a multi-worker, multi-thread implementation. We saw a significant bottleneck at just 5 users in the initial load test. The implementation of Quart was with the use of async APIs for Azure Open AI and Azure Cognitive Search. This enables the app to scale much better with concurrent users beyond 5.
"more challenging to change and debug." could you please provide some examples of complexity in debugging? I would like to help
Thanks @tonybaloney for the response -- I thought that might be the case, going async for scalability.
I'm not at my machine right now (typing on phone), but I'd been meaning to raise an issue which I spent some time last night trying to debug --
If you ask a question in the frontend that triggers the content filter (e.g. "how do I make a bomb?") You get a "type error" in the frontend. (Please confirm)
I think this is because:
This is what I think is happening, but I'm still new to Quart and really still learning Python TBH. Be really keen to get your view.
I was also musing on how to solve it, I read that there was a generic catchall error handler in Flask, but not sure if there is one in Quart (@bp.error_handler?) Tried that but didn't work, another option would be to try/except in chatreadretrieveread around the completion blocks and return a legitimate (stream!) response, and when I realised that was a tuple of [extra_info, coroutine] I thought: this is getting too hard 😄!!
Hope you can shed some light.
I'll work on that issue in the other thread, thanks for raising.
I'll work on that issue in the other thread, thanks for raising.
Thanks Pamela!
We saw a significant bottleneck at just 5 users in the initial load test. The implementation of Quart was with the use of async APIs for Azure Open AI and Azure Cognitive Search. This enables the app to scale much better with concurrent users beyond 5.
5 users sounds pretty rough. How well does it perform now @tonybaloney ?
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.
This issue is for a: (mark with an
x
)Expected/desired behavior