payload for the liquid templates

rdali commented 6 months ago

Hello,

I have been trying to use some of these templates for my A2I integrations. It is great to have them but for this to be useful, we would need to know the expected payload for A2I that each template expects. Can someone point me to the structure of the expected payload?

Thank you!

rdali commented 6 months ago

I realized that the payload is attached to the A2I output. So if you can successfully run an example of an A2I trigger, you can pull the payload from the output under the key "inputContent". Here is the payload for the textract Key-Value template. textract-keyvalue-sample.liquid.payload.json

grantrosse commented 6 months ago

Thanks was looking for this today- The documentation misrepresents the input data: https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-crowd-textract-detection.html

riteshmanglani commented 6 months ago

Thanks was looking for this today- The documentation misrepresents the input data: https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-crowd-textract-detection.html

@grantrosse it is good to see it is working for someone :)

@rdali Thanks you so much for your help so far.

I am getting an InternalServerException. Please see my code along with payload below: -

`

import os import json import time import uuid from urllib.parse import unquote_plus import boto3

def lambda_handler(event, context): textract = boto3.client("textract") a2i = boto3.client("sagemaker-a2i-runtime") FLOW_ARN = os.environ["FLOW_ARN"] if event: file_obj = event["Records"][0] bucketname = str(file_obj["s3"]["bucket"]["name"]) filename = unquote_plus(str(file_obj["s3"]["object"]["key"]))

    # Start document analysis for the whole document
    response = textract.start_document_analysis(
        DocumentLocation={
            "S3Object": {
                "Bucket": bucketname,
                "Name": filename,
            }
        },
        FeatureTypes=["FORMS"],  # Specify the feature types to analyze
        ClientRequestToken=str(uuid.uuid4()),  # Generate a unique client request token
    )

    # Retrieve the job ID from the response
    job_id = response["JobId"]

    # Poll for the completion of the job
    while True:
        job_status = textract.get_document_analysis(JobId=job_id)['JobStatus']
        if job_status in ['SUCCEEDED', 'FAILED']:
            break
        time.sleep(5)  # Wait for 5 seconds before checking again

    # Get the results of the analysis
    response = textract.get_document_analysis(JobId=job_id)

    # Process the results
    print(json.dumps(response))

    # Extracting the Blocks array from the response
    blocks = response.get("Blocks", [])

    print(json.dumps(blocks))

    document_metadata = response.get("DocumentMetadata", {})

    print(json.dumps(document_metadata))

    #hln = uuid.uuid4().hex

    inputContent = {
        "aiServiceRequest":
        {
            "document":
            {
                "s3Object":
                {
                    "bucket": bucketname,
                    "name": filename
                }
            },
            "featureTypes":
            [
                "TABLES",
                "FORMS"
            ],
            "humanLoopConfig":
            {
                "dataAttributes":
                {
                    "contentClassifiers":
                    [
                        "FreeOfAdultContent"
                    ]
                },
                "flowDefinitionArn": FLOW_ARN,
                "humanLoopName": "TheTest"
            }
        },
        "aiServiceResponse":
        {
            "blocks": blocks,
            "documentMetadata": document_metadata
        },
        "humanTaskActivationConditionResults":
        {
          "Conditions": [
            {
              "And": [
                {
                  "ConditionType": "ImportantFormKeyConfidenceCheck",
                  "ConditionParameters": {
                    "ImportantFormKey": "*",
                    "KeyValueBlockConfidenceLessThan": 99,
                    "WordBlockConfidenceLessThan": 99
                  }
                },
                {
                  "ConditionType": "ImportantFormKeyConfidenceCheck",
                  "ConditionParameters": {
                    "ImportantFormKey": "*",
                    "KeyValueBlockConfidenceGreaterThan": 0,
                    "WordBlockConfidenceGreaterThan": 0
                  }
                }
              ]
            }
          ]
        },
        "selectedAiServiceResponse":
        {
            "blocks": blocks
        }
    }  

    a2i.start_human_loop(
        HumanLoopName="TheTest",
        FlowDefinitionArn=FLOW_ARN,
        HumanLoopInput={  
            "InputContent": json.dumps(inputContent)  
        }
    )

    return {
        "statusCode": 200,
        "body": json.dumps("Document processed successfully!"),
    }

return {"statusCode": 500, "body": json.dumps("Issue processing file!")}

`

Below is the error I am getting: -

[ERROR] InternalServerException: An error occurred (InternalServerException) when calling the StartHumanLoop operation (reached max retries: 4): Internal Server Error Traceback (most recent call last): File "/var/task/lambda_function.py", line 125, in lambda_handler a2i.start_human_loop( File "/var/lang/lib/python3.12/site-packages/botocore/client.py", line 553, in _api_call return self._make_api_call(operation_name, kwargs) File "/var/lang/lib/python3.12/site-packages/botocore/client.py", line 1009, in _make_api_call raise error_class(parsed_response, operation_name)

Not sure what I am doing wrong. Any help would be appreciated.

grantrosse commented 6 months ago

^one thing I know for sure is that your blocks won't work without some adjustment, see this stackoverflow question for an example: https://stackoverflow.com/questions/64302986/how-to-highlight-custom-extractions-using-a2is-crowd-textract-analyze-document

So in other words you need to adjust the casing on your KEY_VALUE_SET blocks as well as trim everything but the text and id from the WORD blocks (compare rdali's example to the blocks you receive back from textract and you will see what I mean)

riteshmanglani commented 6 months ago

Okay. I will check, but just to confirm, the template I am using is the default template and not custom template. I hope this would not make any difference in the payload.

riteshmanglani commented 6 months ago

Thank you @grantrosse my code is working now after changing the JSON keys from title case to camel case. But my Human Loop status is now failed. See the screenshot below: -

When I check the error. It shows: -

So the error is: -

ValidationError Task failed to render: [ InvalidParameters: '"grant_read_access" input is not a valid S3 URI: " ".' ].

I am using custom template which is exactly like default kye value pair template.

Thanks Ritesh

rdali commented 6 months ago

If you look at the code of the default template here, you can see that the S3 uri is being constructed through liquid as follows: {% capture s3_uri %}s3://{{ task.input.aiServiceRequest.document.s3Object.bucket }}/{{ task.input.aiServiceRequest.document.s3Object.name }}{% endcapture %} make sure that your bucket and s3Object.name do not have extra slashes or characters

aws-samples / amazon-a2i-sample-task-uis

payload for the liquid templates #8