aws-samples / amazon-textract-code-samples

Amazon Textract Code Samples
MIT No Attribution
416 stars 260 forks source link

Bug Code sample not working Amazon Textract Queries FeatureType #55

Closed abhinit21 closed 7 months ago

abhinit21 commented 10 months ago

Description

I am trying to use the analyze_document method from the amazon-textract-code-samples repository with a document stored in S3. I followed the example from the paystub.ipynb notebook, but I changed the Document parameter to use S3Object instead of Bytes. However, when I run the code, I get a ParamValidationError saying that QueriesConfig is an unknown parameter.

Code

This is the code block from the https://github.com/aws-samples/amazon-textract-code-samples/blob/master/python/queries/paystub.ipynb notebook that I used as a reference:

response = textract.analyze_document(
    Document={'Bytes': imageBytes},
    FeatureTypes=["QUERIES"],
    QueriesConfig={
        "Queries": [{
            "Text": "What is the year to date gross pay",
            "Alias": "PAYSTUB_YTD_GROSS"
        },
        {
            "Text": "What is the current gross pay?",
            "Alias": "PAYSTUB_CURRENT_GROSS"
        },
        {
            "Text": "What is the current net pay?",
            "Alias": "PAYSTUB_CURRENT_NET"
        },
        {
            "Text": "What is the current social security tax?",
            "Alias": "PAYSTUB_CURRENT_SOCIAL_SECURITY_TAX"
        },
        {
            "Text": "How much is the current medicare?",
            "Alias": "PAYSTUB_MEDICARE_TAX"
        },
        {
            "Text": "What is the vacation hours balance?",
            "Alias": "PAYSTUB_VACATION_HOURS"
        },
        {
            "Text": "What is the sick hours balance?",
            "Alias": "PAYSTUB_SICK_HOURS"
        },
        {
            "Text": "What is the employee name?",
            "Alias": "PAYSTUB_EMPLOYEE_NAME"
        }]
    })

This is the code block that I am using:

response = client.analyze_document(
    Document={
        'S3Object': {'Bucket': bucket, 'Name': document}
    },
    FeatureTypes=['QUERIES'],
    QueriesConfig={
        'Queries': [
            { 'Text': 'What is the Name ?', 'Alias': 'PATIENT_NAME' },
            { 'Text': 'What is the Test Name ?', 'Alias': 'TEST_NAME' },
        ]
    }
)

The only difference is that I am using S3Object instead of Bytes for the Document parameter.

Error

This is the error message that I get:

ParamValidationError: Parameter validation failed:
Unknown parameter in input: "QueriesConfig", must be one of: Document, FeatureTypes, HumanLoopConfig

This is the full traceback of the error: image

Expected behavior

I expected the code to run without errors and return the results of the queries for the document in S3.

Actual behavior

The code raises a ParamValidationError and does not return any results.

Environment

Additional context

I searched for similar issues on GitHub, but I could not find any. I also checked the documentation for the analyze_document method, but I did not see any mention of QueriesConfig being incompatible with S3Object. I wonder if this is a bug or a limitation of the API.

anikethc commented 7 months ago

Its working fine for me. Could you please provide the complete code ?

Vaishali17 commented 7 months ago

@abhinit21 check your boto3 lib version and upgrade it to latest. This might help to solve this issue