Open eduardoluismarin opened 1 year ago
We are planning on adding a Google Drive CSV capture very soon. Can you tell us more about your use case since this is the first time we've received a request for PDFs?
Hello happy Hump Day to you...
Thanks for you email and your prompt response
I Let me clarify to you. I want to ingest pdfs files because I am creating a knowledge base and I am integrating your solution with chatgpt I now they are static documents but I am centralizing all knowledge base in your solution Regards
On Wed, Jul 26, 2023, 02:28 dyaffe @.***> wrote:
We are planning on adding a Google Drive CSV capture very soon. Can you tell us more about your use case since this is the first time we've received a request for PDFs?
- How would you want us to ingest PDF documents and sync them to other systems?
- What's the use case?
— Reply to this email directly, view it on GitHub https://github.com/estuary/connectors/issues/849#issuecomment-1651003639, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAZCV7YUPEFUIFVVKCFYY3DXSCTIDANCNFSM6AAAAAA2XRYVF4 . You are receiving this because you authored the thread.Message ID: @.***>
@eduardoluismarin Can you describe how you'd want the data from these to be structured?
For txt
files, it seems pretty straight forward to have it produce something like {"content": "the full contents of the txt file..."}
. But google docs and especially PDFs can contain very complex structures and content, and it's not necessarily clear how those ought to be represented. Do you have an example of what you might want in terms of the JSON representation, or even just how you'd expect a document to be represented when requesting embeddings from the openai api?
System Name
Google Drive
Type
Both
Details
I want to capture PDF, doc and txt files