Open mratanusarkar opened 1 year ago
i think that is something that i would also need. I wish this model to use my c++/ c#/java etc code to use a context for answering the answers.
The data ingestion script in this repo (prepdocs.py) only has support for PDFs. There's support for a few other file formats in another repo (https://github.com/microsoft/sample-app-aoai-chatGPT/blob/main/scripts/data_utils.py). You can also check out open-source tools like llamaindex (https://www.llamaindex.ai/) to see if they can be used with your data format type. The basic idea is to split your text into "chunks" of a reasonable size and store them in the search index.
I'm afraid I don't understand the other questions, can you rephrase? You can use your own documents by replacing what's in that folder. You can remove existing documents by passing --removeall to prepdocs.py in prepdocs.sh/prepdocs.ps1.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.
This issue is for a: (mark with an
x
)Question
Is it possible to work with files other than PDF? Say, text files, doc, excel, and even code files like .py .c, .cpp, .java, other programming files, and any file from any data source? Would be even cooler if I can point to a data source (eg: drive, cloud resource link or say a github repo)
Instead of running
azd init -t azure-search-openai-demo
, Is it possible to do theinit
from a fork of this codebase, that has it's own custom file data for a specific use case in the data folder? In that way, when the app gets deployed, the blob will have different PDF files and the system will be tuned to file data with custom use-case.With reference to the Answer From the FAQ:
I see
data
folder inside the deployed webapp, app service kudu terminal, and also in the storage container BLOB named "content
". I don't have access to./scripts/prepdocs.sh
or./scripts/prepdocs.ps1
as the deployment is currently done by other team members. How do I manipulate the system so that the content is from my file source and not the existing ones from demo? (I believe, dumping my files into the blob or inside the app service -> data won't help, as the files are indexed and converted to some vector DB so that the model get's the reference)OS and Version?
azd version?
Versions