Chainlit / cookbook

Chainlit's cookbook repo
https://github.com/Chainlit/chainlit
662 stars 245 forks source link

azure-openai-pinecone-pdf-qa - Found document with no 'text' key. Skipping #93

Open croziermaxime opened 4 months ago

croziermaxime commented 4 months ago

Hey, I'm currently running the azure-openai-pinecone-pdf-qa locally on my machine, i've set up all the things correctly, the indexation of the pdfs are working as expected: image but once the app is running and i'm trying to make request about the document i get this message: image so the RAG isn't working and the response it gives me is not according to the document. I've tried with multiple pdf's stored in my ./pdfs directory even the one from the repository but nothing changes it's always the same.

If anyone could help me I'll be really grateful, Thanks !

Kotrotsos commented 4 months ago

I have this exact same issue with just the OpenAI LLM. It just fetches its knowledge from the LLM instead of the loaded data. Pinecone has the vectors loaded, I can query it using pinecone-client just fine.

I am using the pinecone.init() way of doing things still though, could that be an issue? If I didn't, the following statement would cause an issue, because it seemed that pc.from_existing_index() doesn't exist.

docsearch = Pinecone.from_existing_index(
        index_name=index_name, embedding=embeddings, namespace=namespace
    )

openai==0.28.1 pinecone-client=2.2.1

My vectors in Pinecone have the following schema

id
values
metadata
   _node_content
   _node_type
   doc_id (none)
   document_id (none)
   document_title
   ref_doc_id (none)