boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
9.07k stars 1.87k forks source link

InvalidParameterException when starting a Textract async job for Document Analysis #4212

Closed MustaphaU closed 3 months ago

MustaphaU commented 3 months ago

Describe the bug

When I try to run an async job for document analysis by following the guideline in Performing Asynchronous Operation with Textract, I get the error:

InvalidParameterException: An error occurred (InvalidParameterException) when calling the StartDocumentAnalysis operation: Request has invalid parameters

Expected Behavior

The result of of the document analysis i.e the tables and form extracts

Current Behavior

{
    "name": "InvalidParameterException",
    "message": "An error occurred (InvalidParameterException) when calling the StartDocumentAnalysis operation: Request has invalid parameters",
    "stack": "---------------------------------------------------------------------------
InvalidParameterException                 Traceback (most recent call last)
Cell In[5], line 288
    284     analyzer.DeleteTopicandQueue()
    287 if __name__ == \"__main__\":
--> 288     main()

Cell In[5], line 283, in main()
    281 analyzer = DocumentProcessor(roleArn, bucket, document, region_name)
    282 analyzer.CreateTopicandQueue()
--> 283 analyzer.ProcessDocument(ProcessType.ANALYSIS)
    284 analyzer.DeleteTopicandQueue()

Cell In[5], line 50, in DocumentProcessor.ProcessDocument(self, type)
     48 # For document analysis, select which features you want to obtain with the FeatureTypes argument
     49 if self.processType == ProcessType.ANALYSIS:
---> 50     response = self.textract.start_document_analysis(
     51         DocumentLocation={'S3Object': {'Bucket': self.bucket, 'Name': self.document}},
     52         FeatureTypes=[\"TABLES\", \"FORMS\"],
     53         NotificationChannel={'RoleArn': self.roleArn, 'SNSTopicArn': self.snsTopicArn})
     54     print('Processing type: Analysis')
     55     validType = True

File c:\\Users\\musta\\anaconda3\\envs\\gptenv\\lib\\site-packages\\botocore\\client.py:565, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    561     raise TypeError(
    562         f\"{py_operation_name}() only accepts keyword arguments.\"
    563     )
    564 # The \"self\" in this scope is referring to the BaseClient.
--> 565 return self._make_api_call(operation_name, kwargs)

File c:\\Users\\musta\\anaconda3\\envs\\gptenv\\lib\\site-packages\\botocore\\client.py:1021, in BaseClient._make_api_call(self, operation_name, api_params)
   1017     error_code = error_info.get(\"QueryErrorCode\") or error_info.get(
   1018         \"Code\"
   1019     )
   1020     error_class = self.exceptions.from_code(error_code)
-> 1021     raise error_class(parsed_response, operation_name)
   1022 else:
   1023     return parsed_response

InvalidParameterException: An error occurred (InvalidParameterException) when calling the StartDocumentAnalysis operation: Request has invalid parameters"
}

The log:

2024-07-20 00:16:58,993 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x000002227812F9A0>
2024-07-20 00:16:58,993 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x000002227812F9A0>
2024-07-20 00:16:58,993 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x000002227812F9A0>
2024-07-20 00:16:58,995 botocore.hooks [DEBUG] Event creating-client-class.textract: calling handler <function add_generate_presigned_url at 0x000002227AE16200>
2024-07-20 00:16:58,995 botocore.hooks [DEBUG] Event creating-client-class.textract: calling handler <function add_generate_presigned_url at 0x000002227AE16200>
2024-07-20 00:16:58,995 botocore.hooks [DEBUG] Event creating-client-class.textract: calling handler <function add_generate_presigned_url at 0x000002227AE16200>
2024-07-20 00:16:58,996 botocore.configprovider [DEBUG] Looking for endpoint for textract via: environment_service
2024-07-20 00:16:58,996 botocore.configprovider [DEBUG] Looking for endpoint for textract via: environment_service
2024-07-20 00:16:58,996 botocore.configprovider [DEBUG] Looking for endpoint for textract via: environment_service
2024-07-20 00:16:58,997 botocore.configprovider [DEBUG] Looking for endpoint for textract via: environment_global
2024-07-20 00:16:58,997 botocore.configprovider [DEBUG] Looking for endpoint for textract via: environment_global
2024-07-20 00:16:58,997 botocore.configprovider [DEBUG] Looking for endpoint for textract via: environment_global
2024-07-20 00:16:58,999 botocore.configprovider [DEBUG] Looking for endpoint for textract via: config_service
2024-07-20 00:16:58,999 botocore.configprovider [DEBUG] Looking for endpoint for textract via: config_service
2024-07-20 00:16:58,999 botocore.configprovider [DEBUG] Looking for endpoint for textract via: config_service
2024-07-20 00:16:58,999 botocore.configprovider [DEBUG] Looking for endpoint for textract via: config_global
2024-07-20 00:16:58,999 botocore.configprovider [DEBUG] Looking for endpoint for textract via: config_global
2024-07-20 00:16:58,999 botocore.configprovider [DEBUG] Looking for endpoint for textract via: config_global
2024-07-20 00:16:59,000 botocore.configprovider [DEBUG] No configured endpoint found.
2024-07-20 00:16:59,000 botocore.configprovider [DEBUG] No configured endpoint found.
2024-07-20 00:16:59,000 botocore.configprovider [DEBUG] No configured endpoint found.
2024-07-20 00:16:59,002 botocore.endpoint [DEBUG] Setting textract timeout as (60, 60)
2024-07-20 00:16:59,002 botocore.endpoint [DEBUG] Setting textract timeout as (60, 60)
2024-07-20 00:16:59,002 botocore.endpoint [DEBUG] Setting textract timeout as (60, 60)
2024-07-20 00:16:59,003 botocore.client [DEBUG] Registering retry handlers for service: textract
2024-07-20 00:16:59,003 botocore.client [DEBUG] Registering retry handlers for service: textract
2024-07-20 00:16:59,003 botocore.client [DEBUG] Registering retry handlers for service: textract
2024-07-20 00:16:59,004 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x000002227812F9A0>
2024-07-20 00:16:59,004 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x000002227812F9A0>
2024-07-20 00:16:59,004 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x000002227812F9A0>
2024-07-20 00:16:59,005 botocore.hooks [DEBUG] Event creating-client-class.sqs: calling handler <function add_generate_presigned_url at 0x000002227AE16200>
2024-07-20 00:16:59,005 botocore.hooks [DEBUG] Event creating-client-class.sqs: calling handler <function add_generate_presigned_url at 0x000002227AE16200>
2024-07-20 00:16:59,005 botocore.hooks [DEBUG] Event creating-client-class.sqs: calling handler <function add_generate_presigned_url at 0x000002227AE16200>
2024-07-20 00:16:59,006 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: environment_service
2024-07-20 00:16:59,006 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: environment_service
2024-07-20 00:16:59,006 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: environment_service
2024-07-20 00:16:59,007 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: environment_global
2024-07-20 00:16:59,007 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: environment_global
2024-07-20 00:16:59,007 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: environment_global
2024-07-20 00:16:59,008 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: config_service
2024-07-20 00:16:59,008 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: config_service
2024-07-20 00:16:59,008 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: config_service
2024-07-20 00:16:59,008 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: config_global
2024-07-20 00:16:59,008 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: config_global
2024-07-20 00:16:59,008 botocore.configprovider [DEBUG] Looking for endpoint for sqs via: config_global
2024-07-20 00:16:59,009 botocore.configprovider [DEBUG] No configured endpoint found.
2024-07-20 00:16:59,009 botocore.configprovider [DEBUG] No configured endpoint found.
2024-07-20 00:16:59,009 botocore.configprovider [DEBUG] No configured endpoint found.
2024-07-20 00:16:59,011 botocore.endpoint [DEBUG] Setting sqs timeout as (60, 60)
2024-07-20 00:16:59,011 botocore.endpoint [DEBUG] Setting sqs timeout as (60, 60)
2024-07-20 00:16:59,011 botocore.endpoint [DEBUG] Setting sqs timeout as (60, 60)
2024-07-20 00:16:59,012 botocore.client [DEBUG] Registering retry handlers for service: sqs
2024-07-20 00:16:59,012 botocore.client [DEBUG] Registering retry handlers for service: sqs
2024-07-20 00:16:59,012 botocore.client [DEBUG] Registering retry handlers for service: sqs
2024-07-20 00:16:59,013 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x000002227812F9A0>
2024-07-20 00:16:59,013 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x000002227812F9A0>
2024-07-20 00:16:59,013 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x000002227812F9A0>
2024-07-20 00:16:59,014 botocore.hooks [DEBUG] Event creating-client-class.sns: calling handler <function add_generate_presigned_url at 0x000002227AE16200>
2024-07-20 00:16:59,014 botocore.hooks [DEBUG] Event creating-client-class.sns: calling handler <function add_generate_presigned_url at 0x000002227AE16200>
2024-07-20 00:16:59,014 botocore.hooks [DEBUG] Event creating-client-class.sns: calling handler <function add_generate_presigned_url at 0x000002227AE16200>
2024-07-20 00:16:59,015 botocore.configprovider [DEBUG] Looking for endpoint for sns via: environment_service
2024-07-20 00:16:59,015 botocore.configprovider [DEBUG] Looking for endpoint for sns via: environment_service
2024-07-20 00:16:59,015 botocore.configprovider [DEBUG] Looking for endpoint for sns via: environment_service
2024-07-20 00:16:59,016 botocore.configprovider [DEBUG] Looking for endpoint for sns via: environment_global
2024-07-20 00:16:59,016 botocore.configprovider [DEBUG] Looking for endpoint for sns via: environment_global
2024-07-20 00:16:59,016 botocore.configprovider [DEBUG] Looking for endpoint for sns via: environment_global
2024-07-20 00:16:59,017 botocore.configprovider [DEBUG] Looking for endpoint for sns via: config_service
2024-07-20 00:16:59,017 botocore.configprovider [DEBUG] Looking for endpoint for sns via: config_service
2024-07-20 00:16:59,017 botocore.configprovider [DEBUG] Looking for endpoint for sns via: config_service
2024-07-20 00:16:59,018 botocore.configprovider [DEBUG] Looking for endpoint for sns via: config_global
2024-07-20 00:16:59,018 botocore.configprovider [DEBUG] Looking for endpoint for sns via: config_global
2024-07-20 00:16:59,018 botocore.configprovider [DEBUG] Looking for endpoint for sns via: config_global
2024-07-20 00:16:59,019 botocore.configprovider [DEBUG] No configured endpoint found.
2024-07-20 00:16:59,019 botocore.configprovider [DEBUG] No configured endpoint found.
2024-07-20 00:16:59,019 botocore.configprovider [DEBUG] No configured endpoint found.
2024-07-20 00:16:59,023 botocore.endpoint [DEBUG] Setting sns timeout as (60, 60)
2024-07-20 00:16:59,023 botocore.endpoint [DEBUG] Setting sns timeout as (60, 60)
2024-07-20 00:16:59,023 botocore.endpoint [DEBUG] Setting sns timeout as (60, 60)
2024-07-20 00:16:59,027 botocore.client [DEBUG] Registering retry handlers for service: sns
2024-07-20 00:16:59,027 botocore.client [DEBUG] Registering retry handlers for service: sns
2024-07-20 00:16:59,027 botocore.client [DEBUG] Registering retry handlers for service: sns
2024-07-20 00:16:59,029 botocore.hooks [DEBUG] Event before-parameter-build.sns.CreateTopic: calling handler <function generate_idempotent_uuid at 0x0000022278150EE0>
2024-07-20 00:16:59,029 botocore.hooks [DEBUG] Event before-parameter-build.sns.CreateTopic: calling handler <function generate_idempotent_uuid at 0x0000022278150EE0>
2024-07-20 00:16:59,029 botocore.hooks [DEBUG] Event before-parameter-build.sns.CreateTopic: calling handler <function generate_idempotent_uuid at 0x0000022278150EE0>
2024-07-20 00:16:59,030 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Region': 'us-west-2', 'UseDualStack': False, 'UseFIPS': False}
2024-07-20 00:16:59,030 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Region': 'us-west-2', 'UseDualStack': False, 'UseFIPS': False}
2024-07-20 00:16:59,030 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Region': 'us-west-2', 'UseDualStack': False, 'UseFIPS': False}
2024-07-20 00:16:59,031 botocore.regions [DEBUG] Endpoint provider result: https://sns.us-west-2.amazonaws.com/
2024-07-20 00:16:59,031 botocore.regions [DEBUG] Endpoint provider result: https://sns.us-west-2.amazonaws.com/
2024-07-20 00:16:59,031 botocore.regions [DEBUG] Endpoint provider result: https://sns.us-west-2.amazonaws.com/
2024-07-20 00:16:59,033 botocore.hooks [DEBUG] Event before-call.sns.CreateTopic: calling handler <function add_recursion_detection_header at 0x0000022278150B80>
2024-07-20 00:16:59,033 botocore.hooks [DEBUG] Event before-call.sns.CreateTopic: calling handler <function add_recursion_detection_header at 0x0000022278150B80>
2024-07-20 00:16:59,033 botocore.hooks [DEBUG] Event before-call.sns.CreateTopic: calling handler <function add_recursion_detection_header at 0x0000022278150B80>
2024-07-20 00:16:59,033 botocore.hooks [DEBUG] Event before-call.sns.CreateTopic: calling handler <function inject_api_version_header_if_needed at 0x0000022278152710>
2024-07-20 00:16:59,033 botocore.hooks [DEBUG] Event before-call.sns.CreateTopic: calling handler <function inject_api_version_header_if_needed at 0x0000022278152710>
2024-07-20 00:16:59,033 botocore.hooks [DEBUG] Event before-call.sns.CreateTopic: calling handler <function inject_api_version_header_if_needed at 0x0000022278152710>
2024-07-20 00:16:59,034 botocore.endpoint [DEBUG] Making request for OperationModel(name=CreateTopic) with params: {'url_path': '/', 'query_string': '', 'method': 'POST', 'headers': {'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', 'User-Agent': 'Boto3/1.34.117 md/Botocore#1.34.117 ua/2.0 os/windows#10 md/arch#amd64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy Botocore/1.34.117'}, 'body': {'Action': 'CreateTopic', 'Version': '2010-03-31', 'Name': 'AmazonTextractTopic1721449019029'}, 'url': 'https://sns.us-west-2.amazonaws.com/', 'context': {'client_region': 'us-west-2', 'client_config': <botocore.config.Config object at 0x000002227BE55D20>, 'has_streaming_input': False, 'auth_type': None}}
2024-07-20 00:16:59,034 botocore.endpoint [DEBUG] Making request for OperationModel(name=CreateTopic) with params: {'url_path': '/', 'query_string': '', 'method': 'POST', 'headers': {'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', 'User-Agent': 'Boto3/1.34.117 md/Botocore#1.34.117 ua/2.0 os/windows#10 md/arch#amd64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy Botocore/1.34.117'}, 'body': {'Action': 'CreateTopic', 'Version': '2010-03-31', 'Name': 'AmazonTextractTopic1721449019029'}, 'url': 'https://sns.us-west-2.amazonaws.com/', 'context': {'client_region': 'us-west-2', 'client_config': <botocore.config.Config object at 0x000002227BE55D20>, 'has_streaming_input': False, 'auth_type': None}}
2024-07-20 00:16:59,034 botocore.endpoint [DEBUG] Making request for OperationModel(name=CreateTopic) with params: {'url_path': '/', 'query_string': '', 'method': 'POST', 'headers': {'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', 'User-Agent': 'Boto3/1.34.117 md/Botocore#1.34.117 ua/2.0 os/windows#10 md/arch#amd64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy Botocore/1.34.117'}, 'body': {'Action': 'CreateTopic', 'Version': '2010-03-31', 'Name': 'AmazonTextractTopic1721449019029'}, 'url': 'https://sns.us-west-2.amazonaws.com/', 'context': {'client_region': 'us-west-2', 'client_config': <botocore.config.Config object at 0x000002227BE55D20>, 'has_streaming_input': False, 'auth_type': None}}
2024-07-20 00:16:59,036 botocore.hooks [DEBUG] Event request-created.sns.CreateTopic: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x000002227BE558A0>>
2024-07-20 00:16:59,036 botocore.hooks [DEBUG] Event request-created.sns.CreateTopic: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x000002227BE558A0>>
2024-07-20 00:16:59,036 botocore.hooks [DEBUG] Event request-created.sns.CreateTopic: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x000002227BE558A0>>
2024-07-20 00:16:59,037 botocore.hooks [DEBUG] Event choose-signer.sns.CreateTopic: calling handler <function set_operation_specific_signer at 0x0000022278150DC0>
2024-07-20 00:16:59,037 botocore.hooks [DEBUG] Event choose-signer.sns.CreateTopic: calling handler <function set_operation_specific_signer at 0x0000022278150DC0>
2024-07-20 00:16:59,037 botocore.hooks [DEBUG] Event choose-signer.sns.CreateTopic: calling handler <function set_operation_specific_signer at 0x0000022278150DC0>
2024-07-20 00:16:59,038 botocore.auth [DEBUG] Calculating signature using v4 auth.
2024-07-20 00:16:59,038 botocore.auth [DEBUG] Calculating signature using v4 auth.
2024-07-20 00:16:59,038 botocore.auth [DEBUG] Calculating signature using v4 auth.

Reproduction Steps

Follow the guidelines in Performing Asynchronous Operation with Textract and run the python script.

Possible Solution

No response

Additional Information/Context

No response

SDK version used

Name: boto3 Version: 1.34.117

Environment details (OS name and version, etc.)

Windows 11 pro

tim-finnigan commented 3 months ago

Thanks for reaching out. The start_document_analysis command makes a call to the StartDocumentAnalysis API. That error is coming from the API. The error has come up a few times before in other issues, for example https://github.com/boto/boto3/issues/2653. As mentioned there (and also in a Stack Overflow post) an invalid S3 bucket name could cause the error.

Can you verify the the S3 bucket name you're using is valid? And can you also confirm that the IAM role you're using has the necessary permissions to use the Textract service and access the S3 bucket where the document is stored?

If still seeing an issue please provide the complete code snippet you're using to reproduce the issue.

MustaphaU commented 3 months ago

Thanks @tim-finnigan The error was an incorrect role arn specification. I was passing the policy ARN of the SNS permissions attached to the textract role rather than the textract role ARN itself.

github-actions[bot] commented 3 months ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.