Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
5.93k stars 4.07k forks source link

Accessing prepdocslib from the app, so that can ingest PDF files from the UI #997

Open douglaswross opened 10 months ago

douglaswross commented 10 months ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [X ] feature request
- [X ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

macOS Latest

azd version?

run azd version and copy paste here.

Versions

Mention any other details that might be useful

  1. Great demo, getting a lot of use from what you have written, thank you very much.
  2. Would like to enable the user to be able to add PDFs to the system, get them ingested like prepdocs.py by using prepdocslib, and similar code to prepdocs.py, from the UI at runtime
  3. I am not very familiar with running quart and virtual environments, and my problem is that I cannot access the prepdocslib from within the app / approaches folder where I have written the py code to do prepdocs.py style calls. Note I have written the Chat.tsx, api.ts, and app.py changes as well to make it work, however I am completely stuck with loading the prepdocslib modules. What I have written is a complete hack - the loadfile component appears on the developer panel, at the moment - just trying to get it working end to end.
  4. Sorry to waste your time, however looking at what exists on the azure backend app service, I now realise that only the compiled code in app are actually deployed, and putting something like the following in app.py will not work at runtime, because it is not on the app service. So putting this in app.py to update the sys path will simply not work, I know that know... ####### import sys import os parent_dir = os.path.dirname(os.path.dirname(os.path.abspath(file))) scripts_path = os.path.join(parent_dir, 'scripts') sys.path.insert(0, scripts_path) #######

Is there a simple way to access the prepdocslib from a new python file in app/approaches/ where new file works a lot like prepdocs.py

Thanks very much for any help.


Thanks! We'll be in touch soon.

pamelafox commented 10 months ago

Good question!

I see a few options:

1) Move the entire prepdocslib into app/backend, and then adjust scripts/prepdocs.sh to call it from there. In that case you'd probably just use a single Python environment, and merge the requirements files.

2) Add an azd predeploy hook that will cp -r the prepdocslib folder into app/backend. You would still need to make sure you had all the requirements however, and you'd also need to cp for local development.

So I think I'd vote for #1. Let me know if you need help making that adjustment.