aws-samples / amazon-textract-code-samples

Amazon Textract Code Samples
MIT No Attribution
406 stars 263 forks source link

textract failed to get the file which has spaces in the the filename. #12

Closed j-sieger closed 3 years ago

j-sieger commented 3 years ago
 bucket = event['Records'][0]['s3']['bucket']['name']
    #bucket = 'textract-input-files'
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')

    try:
        textract = boto3.client('textract')

        textract.start_document_text_detection(
        DocumentLocation={
            'S3Object': {
                'Bucket': bucket,
                'Name': key
            }
        },``

the above code is failed to execute when i pass the file which has the 'spaces' in the file name

An error occurred (InvalidParameterException) when calling the StartDocumentTextDetection operation: Request has invalid parameters Same code works fine if i remove spaces from the filename

Error getting object Arkilo and Pierce.pdf from bucket textract-input-files. Make sure they exist and your bucket is in the same region as this function.

[ERROR] InvalidParameterException: An error occurred (InvalidParameterException) when calling the StartDocumentTextDetection operation: Request has invalid parameters
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 37, in lambda_handler
    raise e
  File "/var/task/lambda_function.py", line 30, in lambda_handler
    'SNSTopicArn': 'arn:aws:sns:us-east-**************:SNStopicTextract'
  File "/opt/python/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/python/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
schadem commented 3 years ago

I added a package and method to call Textract which also works for paginated output. https://github.com/aws-samples/amazon-textract-textractor/tree/master/caller Just use call_textract from that and you'll be good.