Closed Komal-99 closed 7 months ago
@Komal-99, I think this call returns what is called s3-signed-url
. The binary data is directly uploaded to the S3 bucket, it does not pass through ApiGW.
Hope that helps.
My question was how to pass a file! I am trying to pass file like this in Body Binary and json payload as all the values in https://github.com/aws-solutions/enhanced-document-understanding-on-aws/blob/main/source/lambda/upload-document/index.js#L39
but still giving 403 error message.
The REST endpoint does not accept the file. It returns a URL in the response which you use as a POST
to upload the file.
So In that url I will hit post request and in binary attach my file ? Is their any documentation for the endpoints clearly?
Hey @Komal-99, thank you for your interest and use of enhanced-document-understanding-on-aws. To upload a file, you can using a python script instead of using postman, for convenience. Please follow the following steps:
import boto3
COGNITO_ENDPOINT = "https://cognito-idp.us-east-1.amazonaws.com/"
CLIENT_ID = "<YOUR CLIENT ID OBTAINED FROM CLOUDFORMATION STACK OUTPUT>"
USER_POOL_ID = "<OBTAINED FROM CLOUDFORMATION STACK OUTPUT>"
USER_NAME = "<YOUR USERNAME>"
PASSWORD = "<YOUR PASSWORD>"
REGION="us-east-1"
def generate_token():
client = boto3.client("cognito-idp", region_name=REGION)
response = client.initiate_auth(
AuthFlow="USER_PASSWORD_AUTH",
AuthParameters={"USERNAME": USER_NAME, "PASSWORD": PASSWORD},
ClientId=CLIENT_ID,
)
token = response["AuthenticationResult"]["IdToken"]
save_token(token)
return token
Create a new case using the UI or using the POST /case
request. This requires the request parameter caseName
set to a sting value. {"caseName":"case-4"}
for example
Next, you have to first made a POST request to /document
endpoint with parameters, as described in the code block below. This request will need to include the auth token as described above. This will return a s3 signed url. Using this signed url you have to make a second POST request where you can upload a file as a binary blob.
import requests
import json
from generate_auth_token import generate_token
from pathlib import Path
REST_API_ENDPOINT = "https://<YOUR _API_ID>.execute-api.us-east-1.amazonaws.com/prod" # obtained from cloudformation stack output
USER_NAME = "<USERNAME>"
PASSWORD = "<PASSWORD>"
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36" # this is required
DATA_FILE_DIR = Path(__file__).parent
token = generate_token()
def get_signed_post_policy(filename: str, documentType: str, case_id: str, case_name:str, body=None):
user_id = case_id.split(":")[0]
if not body:
request_body = {
"userId": user_id,
"caseId": case_id,
"caseName": case_name,
"fileName": filename,
"fileExtension": f".{filename.split('.')[-1]}",
"documentType": documentType,
"tagging": f"<Tagging><TagSet><Tag><Key>userId</Key><Value>{user_id}</Value></Tag></TagSet></Tagging>"
}
else:
request_body = body
doc_upload_api = f"{REST_API_ENDPOINT}/document"
print(request_body)
headers = {"Authorization": token, "User-Agent": USER_AGENT, "Content-Type": "application/json"}
response = requests.post(doc_upload_api, data=json.dumps(request_body), headers=headers)
try:
print(response.json())
except:
print(response.content)
return response.json()
def upload_file(data_file_path=None, post_policy_fields: dict[str, str] = None, bucket_url: str = None):
print("uploading file to bucket...")
data = {
**post_policy_fields,
}
response = requests.post(bucket_url, data=data, files={"file": open(data_file_path, "rb")})
print(response)
return response
if __name__ == "__main__":
filename = "simple-document-image.pdf"
case_name = "<CASE NAME FROM STEP 2>"
case_id = "<CASE_ID_RETURNED_FROM_STEP_2>"
post_policy = get_signed_post_policy(filename=filename, documentType="generic", case_id=case_id, case_name=case_name)
upload_file(
data_file_path=DATA_FILE_DIR / filename,
post_policy_fields=post_policy["fields"],
bucket_url=post_policy["url"],
)
Thank you @mukitmomin . @Komal-99 there is no pre-defined endpoint. The file upload URL is based on the name of the bucket in your account. Also this concept of bucket URL is called signed-url
. You can find more information about signed urls here
In the UI source code, there is the function that gets called - https://github.com/aws-solutions/enhanced-document-understanding-on-aws/blob/1904344329afb27c0091529c52eaa7929c382f25/source/ui/src/components/UploadDocumentView.tsx#L183.
Thankyou @mukitmomin This was a great help ! I need to build a solution using enhanced-document-understanding-on-aws. and can't find any proper details. This python code for upload document eased my work so much. can you further guide me to move forward?
from your message I got that postman is not a right platform because even I tried /cases
and /document/download
from postman but as expected they are not working from postman.
Beside that I need to get the text extraction inference response In Raw Text
, Table data
and key value pairs
.
It is a great support! Thankyou @mukitmomin @knihit
You should be able to use the exact same logic as above to generate the token, and then make a GET request using the python requests library to the endpoint: /inferences/{caseId}/{documentId}/{inferenceType}
. The exact request and response schema can be obtained by exporting the API from ApiGateway as a swagger or openapi 3 file as well. Check the API reference in our implementation guide as well.
Below is a python function to make get requests, for example to get a specific case details:
def get_case(case_id: str = None):
if not case_id:
raise ValueError("`case_id` is required")
get_case_api = f"{REST_API_ENDPOINT}/case/{case_id}"
headers = {"Authorization": token, "User-Agent": USER_AGENT, "Content-Type": "application/json"}
response = requests.get(get_case_api, headers=headers)
return response.json()
Hi,
I tried hitting GET request using Lambda function as well as from python but in both getting same error. and in lambda function getting error
Why does the request say inferences/undefined
. The request for inferences should have the following REST resource path /inferences/{caseId}/{documentId}/{inferenceType}
. If you run the Developer Tools in the browser for the default deployment. You will be able to capture the appropriate structure.
On a side note, these clarifications are outside of the purview of the actual application. These are neither feature requests nor bugs in the application. I cannot guarantee that we will be able to clarify such issues in the future.
Actually Undefined is the user because we are not using authentication token and followed by the Inference type value.
Can you help me with some clear documentation or information then! Because it is quite difficult to find and their is no mention regarding how files are being organised and which file is for which purpose!
Thanks
All files are stored in S3 bucket. The location of the files is in a DynamoDB table. The cardinality is User
--> 0...N Cases
. Each Case
--> 0..N Documents
. So based on the case id, you can get the location of the documents and their inferences from the DynamoDB table.
You can also get more details on the architecture from the following guide - https://docs.aws.amazon.com/solutions/latest/enhanced-document-understanding-on-aws/solution-overview.html.
Closing this ticket. For any bugs, issues with the existing application, or feature requests, please feel free to open a new issue.
Hi @mukitmomin @knihit , I am opening this issue to check if the Request body for RestAPI got updated. Not able to find it but my previous code throughs me an error of Invalid Request Body.
def create_case(case_id: str = None):
if not case_id:
raise ValueError("`case_id` is required")
request_body = {
"caseName": case_id,
}
get_case_api = f"{REST_API_ENDPOINT}/case"
try:
region = "us-east-1" # Update this with your AWS region
service = "execute-api"
# Sign the request using AWS Signature Version 4
aws_auth = AWS4Auth(ACCESS_KEY,SECRET_KEY, region, service, session_token=session_token)
headers = {
"Authorization": f"Bearer {session_token}",
"User-Agent": USER_AGENT,
"Content-Type": "application/json"
}
# Send the request with authentication
response = requests.post(get_case_api,data=json.dumps(request_body), headers=headers)
# Check if the response was successful
if response.status_code == 200:
# Parse the JSON data from the response body
data = response.json()
case_id = data.get('caseId')
return case_id
else:
print(f"Error: {response.status_code} - {response.text}")
except NoCredentialsError:
print("AWS credentials not found or invalid.")
return None
When Passing a String case_id
due to which case not getting created and I am not able to use the service.
Hi, I am trying to hit /document api with the expected payload I found suing inspect in UI but not able to figure out how to send a file using postman for custom Non UI build. { caseId : "sm:b6485fe3-c6ef-4f01-86a4-f57d96e77adf" caseName : "XYZ" documentType:"generic" fileExtension:".pdf" fileName: "9159.pdf" tagging: "userId sm "
userId: "sm****"}
But not sure how to pass a file is it binary or form data and if it is form data then what is its key.