GPT-vision enhancement related queries

singloudly90 commented 8 months ago

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
- [x] enquiries

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

azd version?

run azd version and copy paste here.

Versions

Mention any other details that might be useful

Understand that GPT4-v is supported, appreciate your effort. The existing solution have Azure Document Intelligence doing the OCR, and OpenAI embedding doing the text embedding. Current solution has computer vision can also do OCR and do image embedding.

Question: When all credential has input to run the code, will the solution able to determine which service to use with cost efficiently or it will applied both ?

Example i have a financial performance tabular format but in screenshot, Azure Document Intelligence should be sufficient. But in same time computer vision also applied in credential, will computer vision embedded it as image instead of letting Azure DI convert to text or both will do in parallel Asking this question because in normal financial report will be having mixture of tabular format and chart.

For tabular table in image: Azure DI will be best For Charting: Computer Vision can do the vectorization embedding.

Thanks again for keep improving the repos!

Thanks! We'll be in touch soon.

pamelafox commented 8 months ago

Hm, I'm not sure I understand the question.

We use Azure Document Intelligence so that we can extract any relevant text from a document, since most documents have text in addition to charts, and we want to find that text when performing a hybrid search.

We use the Computer Vision API to compute an embedding for images, since it can embed images in addition to text. That way, if an image contains a picture of a tree, and the user has a question related to trees, then our search can find both text about trees and images of trees.

So we need both of these tools for different steps. I think we would only drop Document Intelligence if our input data had literally no text in it at all (which is possible! I haven't tried that with this app to see how well that'd work).

singloudly90 commented 8 months ago

@pamelafox thanks for trying to answer my question , although my question seems little bit confusing..

Maybe i can provide an example: if the table below is an image in my PDF, will it be converted to text and being text embedded , in same time also being image embedded if computer vision is enable?

Azure-Samples / azure-search-openai-demo