aws-solutions / content-localization-on-aws

Automatically generate multi-language subtitles using AWS AI/ML services. Machine generated subtitles can be edited to improve accuracy and downstream tracks will automatically be regenerated based on the edits. Built on Media Insights Engine (https://github.com/awslabs/aws-media-insights-engine)
Apache License 2.0
38 stars 17 forks source link

Save Translation edits throws an error in WebToVTTCaptions v2.0.0 #354

Open flick1533 opened 1 year ago

flick1533 commented 1 year ago

Describe the bug

When I click on the Save edits button after editing the Translation, the workflow starts to rerun but ends with an Error status.

To Reproduce

1. Upload a video and run a workflow. The workflow I ran has the following configuration.
    Audio Operators - Transcribe - Source Language - English US
    Text Operators - Translate - Spanish
    Video is uploaded and analyzed well.
2. Edit the Tanslation for the workflow
3. Click Save edits
4. The workflow ends in an Error state. The Step Functions console shows the error is in WebToVTTCaptions.

Expected behavior

The workflow reruns with the edited translation.

Please complete the following information about the solution:

No

Yes

This is ValueError in step: Complete Stage TransformText

{
  "errorMessage": "Exception: 'Stage TransformText encountered and error during execution, aborting the workflow'",
  "errorType": "ValueError",
  "requestId": "9cf69725-b853-4ff8-9159-f9cd13f9dacd",
  "stackTrace": [
    "  File \"/var/task/app.py\", line 327, in complete_stage_execution_lambda\n    return complete_stage_execution(\"lambda\", event[\"Name\"], event[\"Status\"], event[\"Outputs\"], event[\"WorkflowExecutionId\"])\n",
    "  File \"/var/task/app.py\", line 467, in complete_stage_execution\n    raise ValueError(\n"
  ]
}

This is the error from the Execute WebToVTTCaptions (TransformText) - Caught twice in Execute WebToVTCaptions (TransformText)

{
  "Name": "WebToVTTCaptions",
  "AssetId": "1224dc3f-5e34-42cf-be48-1c4b0081bfed",
  "WorkflowExecutionId": "e416570c-bd06-4a56-8ba4-757732ab5ecb",
  "Input": {
    "Media": {
      "Video": {
        "S3Bucket": "clo-miestack-XXXXXXXXXXXX-dataplane-XXXXXXXXXXXX",
        "S3Key": "public/upload/Welcome to AWS, London.mp4"
      }
    },
    "MetaData": {}
  },
  "Configuration": {
    "MediaType": "MetadataOnly",
    "Enabled": true,
    "TargetLanguageCodes": [
      "es",
      "en"
    ]
  },
  "Status": "Started",
  "MetaData": {},
  "Media": {},
  "Outputs": {
    "Error": "MasExecutionError",
    "Cause": "{\"errorMessage\": \"{'Name': 'WebToVTTCaptions', 'AssetId': '1224dc3f-5e34-42cf-be48-1c4b0081bfed', 'WorkflowExecutionId': 'e416570c-bd06-4a56-8ba4-757732ab5ecb', 'Input': {'Media': {'Video': {'S3Bucket': 'clo-miestack-XXXXXXXXXXXXX-dataplane-XXXXXXXXXXXXX', 'S3Key': 'public/upload/Welcome to AWS, London.mp4'}}, 'MetaData': {}}, 'Configuration': {'MediaType': 'MetadataOnly', 'Enabled': True, 'TargetLanguageCodes': ['es', 'en']}, 'Status': 'Error', 'MetaData': {'WebCaptionsError': \\\"No valid inputs 'SourceLanguageCode'\\\"}, 'Media': {}}\", \"errorType\": \"MasExecutionError\", \"requestId\": \"37b7b2f0-e57a-4bc8-bdfd-c85857db4624\", \"stackTrace\": [\"  File \\\"/var/task/webcaptions.py\\\", line 589, in create_vtt\\n    webcaptions_object = WebCaptions(operator_object)\\n\", \"  File \\\"/var/task/webcaptions.py\\\", line 77, in __init__\\n    raise MasExecutionError(operator_object.return_output_object())\\n\"]}"
  }
}

It seems the application fails to identify the SourceLangnugeCode because after editing the translated subtitles and saving, the SourceLanguageCodes (en) and TargetLanguageCodes (es) are both passed as the TargetLanguageCodes, seen below.

WebToVTCaptions (TransformText) error

'Configuration': {'MediaType': 'MetadataOnly', 'Enabled': True, 'TargetLanguageCodes': ['es', 'en']}, 'Status': 'Error', 'MetaData': {'WebCaptionsError': \\\"No valid inputs 'SourceLanguageCode'\\\"},

Screenshots

image

Additional context

Add any other context about the problem here.

mortizbey commented 4 months ago

Hi team! I had this same error editing both the Subtitles or the Translation. We identified that the error occurs on lambdas: WebToVTTCaptionsFunction and WebToSRTCaptionsFunction specially here since it does not receive any of these values (specially source_language_code) which prevents it to continue working:

if "TranscribeSourceLanguage" in self.operator_object.input['MetaData']:
                self.source_language_code = self.operator_object.input['MetaData']['TranscribeSourceLanguage'].split('-')[0]
            elif "TranslateSourceLanguage" in self.operator_object.input['MetaData']:
                self.source_language_code = self.operator_object.input['MetaData']['TranslateSourceLanguage'].split('-')[0]
            else:
                # If TranscribeSourceLanguage is not available, then SourceLanguageCode
                # must be present in the operator Configuration block.
                self.source_language_code = self.operator_object.configuration.get("SourceLanguageCode","en")

            if "TargetLanguageCodes" in self.operator_object.configuration:
                self.target_language_codes = self.operator_object.configuration["TargetLanguageCodes"]
            if "ExistingSubtitlesObject" in self.operator_object.configuration:
                self.existing_subtitles_object = self.operator_object.configuration["ExistingSubtitlesObject"]
                self.existing_subtitles = True

It seems that these values are not being correctly propagated on Subtitle or Translation editing.

amzn-gaod commented 4 months ago

Thank you for reporting this issue. We will consider your feedback and have added this request to our backlog for this solution.