GoogleCloudPlatform / generative-ai

Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI
https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview
Apache License 2.0
6.62k stars 1.76k forks source link

QA DocAI can't work because of NaN's (and little update request) #217

Closed skiiiks closed 10 months ago

skiiiks commented 10 months ago

In this example: https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/use-cases/document-qa/question_answering_documentai_vector_store_palm.ipynb

1) You have to update the import of google-cloud-documentai-toolbox from 0.10.1 of the document to 0.10.3a, if not, it doesn't go ;-)

2) It crashes at one point:

pdf_data_sample["embedding"] = pdf_data_sample["chunks"].apply(
    lambda x: embedding_model_with_backoff([x])
)

Investigating, it comes from the chunks, there are "NaNs" in the chunks ... and there are because when handling the PDF, it is only able to grab the first page, the rest are all "null" values.

Any solution / workaround ?

holtskinner commented 10 months ago

I already fixed issue 1 in #226

And thanks for the context on issue 2. I'll see if I can resolve the issue.

SampathkumarSubramaniam commented 8 months ago

I already fixed issue 1 in #226

And thanks for the context on issue 2. I'll see if I can resolve the issue. Any news on issue 2. I could reproduce from my side following https://cloud.google.com/blog/products/ai-machine-learning/ask-your-documents-document-ai-and-palm2-for-question-answering