GoogleCloudPlatform / document-ai-samples

Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
https://cloud.google.com/document-ai
Apache License 2.0
207 stars 94 forks source link

[docai-ocr-python] #758

Open crypto-frog opened 4 months ago

crypto-frog commented 4 months ago

I have followed the tutorial exactly, and i keep getting this error. my key.json is located in home/jonathan_ruben_fernandes/key.json

Here is my error message. Nothing on stack overflow and openai cannot help me figure out the error:

/usr/bin/python /home/jonathan_ruben_fernandes/online_processing.py
jonathan_ruben_fernandes@cloudshell:~$ /usr/bin/python /home/jonathan_ruben_fernandes/online_processing.py
Traceback (most recent call last):
  File "/home/jonathan_ruben_fernandes/online_processing.py", line 2, in <module>
    from google.cloud import documentai
ImportError: cannot import name 'documentai' from 'google.cloud' (unknown location)
jonathan_ruben_fernandes@cloudshell:~$ 

from google.api_core.client_options import ClientOptions
from google.cloud import documentai

PROJECT_ID = "project-doc-ocr-416018"
LOCATION = "us"  # Format is 'us' or 'eu'
PROCESSOR_ID = "51d53e5ecbc3418d"  # Create processor in Cloud Console

# The local file in your current working directory
FILE_PATH = "Winnie_the_Pooh_3_Pages.pdf"
# Refer to https://cloud.google.com/document-ai/docs/file-types
# for supported file types
MIME_TYPE = "application/pdf"

# Instantiates a client
docai_client = documentai.DocumentProcessorServiceClient(
    client_options=ClientOptions(api_endpoint=f"{LOCATION}-documentai.googleapis.com")
)

# The full resource name of the processor, e.g.:
# projects/project-id/locations/location/processor/processor-id
# You must create new processors in the Cloud Console first
RESOURCE_NAME = docai_client.processor_path(PROJECT_ID, LOCATION, PROCESSOR_ID)

# Read the file into memory
with open(FILE_PATH, "rb") as image:
    image_content = image.read()

# Load Binary Data into Document AI RawDocument Object
raw_document = documentai.RawDocument(content=image_content, mime_type=MIME_TYPE)

# Configure the process request
request = documentai.ProcessRequest(name=RESOURCE_NAME, raw_document=raw_document)

# Use the Document AI client to process the sample form
result = docai_client.process_document(request=request)

document_object = result.document
print("Document processing complete.")
print(f"Text: {document_object.text}")
holtskinner commented 4 months ago

Hi @crypto-frog, just to clarify, did you install the Document AI client library?

pip install --upgrade google-cloud-documentai

Are you running this in a colab notebook or in an iPython interactive environment? I have seen this same behavior in certain colab notebooks even after installing.

crypto-frog commented 4 months ago

Hi holtskinner ! Great to hear from you. I followed the tutorial at https://codelabs.developers.google.com/codelabs/docai-ocr-python#7 step by step, including this step:

pip3 install --upgrade google-cloud-documentai pip3 install --upgrade google-cloud-storage pip3 install --upgrade google-cloud-documentai-toolbox

I am using the google cloud CLI, everything is remote on the google cloud as in the tutorial. I tried creating a new project. It is installed, I checked. I tried reinstalling and requirement is satisfied. I have used AI to look at my code and the steps and can find nothing wrong.

Thank you !!!

holtskinner commented 4 months ago

My theory is that it's a python versioning issue where you're running the code on a different version than installed the libraries.

Try running:

/usr/bin/python -m pip install --upgrade google-cloud-documentai google-cloud-storage google-cloud-documentai-toolbox

Then try running again.

crypto-frog commented 4 months ago

I ran the command - requirement already satisfied. Same error. I am not using any local resources to do this - everything is in google cloud and I am following the tutorial. I don't want to give up because usually google services work. Please help !