Transcribe from another language

aws-solutions-library-samples / guidance-for-media2cloud-on-aws

Guidance for Media2Cloud on AWS solution (formerly known as AWS Media2Cloud Solution) is designed to demonstrate a serverless ingest framework that can quickly setup a baseline ingest workflow for placing video assets and associated metadata under management control of an AWS customer.

https://aws.amazon.com/solutions/guidance/media2cloud-on-aws/

Apache License 2.0

116 stars 64 forks source link

Transcribe from another language #3

Closed weiping-bj closed 5 years ago

weiping-bj commented 5 years ago

Hi,

I plan to test this demo using a Chinese language video.

To make Transcribe support Chinese, I exported lambda function code "transcribe-MediaAnalysisStac-*.zip" and modified two files: /lib/transcribe/transcribe.js and /lib/transcribe/transcribe.js (I only changed LanguageCode from "en-US" to "zh-CN")

I used aws lambda update-function-code command to update this lambda function and got a success respond.

But I encountered an error respond when I tried to create metadata. Error messages are as follows:

{
  "Error": "AnalyticsError",
  "Cause": "{\"errorMessage\":\"media-analytics-solution state machine failed, FAILED\",\"errorType\":\"AnalyticsError\",\"stackTrace\":[\"exports.getAnalyticsStateMachine (/var/task/lib/metadata/analytics/index.js:206:13)\",\"<anonymous>\",\"process._tickDomainCallback (internal/process/next_tick.js:228:7)\"]}"
}

Does anyone could guide me how to fix this issue?

Thanks

aws-kens commented 5 years ago

Hi @weiping-bj,

The problem you encountered is because Chinese language is currently not supported with Amazon Comprehend (our NLP service). Although Amazon Transcribe can process zh-CN, the analysis state machine will still fail when it runs the NLP process with Amazon Comprehend.

weiping-bj commented 5 years ago

Hi @aws-kens ,

Thanks for your feedback. I have two more questions.

Based on my understanding, if the analysis state machine would throw an error when it runs Comprehend, Transcribe process should be successfully executed. But I didn't find any executed jobs in Transcribe console (either successful or failed). Please correct me if my understanding is wrong.
How to bypass Comprehend if I just need transcribe and subtitle function?

Thanks

aws-kens commented 5 years ago

Hi @weiping-bj,

1) You are right. If it runs the Transcribe service, you should be able to see the job in Transcribe. There is a state machine called media-analysis. You can check to see if the detail error from that state machine. 2) With the current version, you would need to modify the media analysis state machine to bypass the Comprehend logic. We do have a beta private program for Media2Cloud V2 that can disable specific AIML detections. Let me know if you are interested, I will send you a separate email.

aws-kens commented 5 years ago

Hi @weiping-bj, We have just launched a major update of Media2Cloud V2 that has a lot of enhancements and new features. V2 should resolve the problem you had before. Give it a try