Fail state executed in step: Processing Error

yemaney commented 2 years ago

Using the one-click deployment resulted in an error. Wondering if there are any clues as to why? I've checked the lambda function cloud watch logs and the worse I get is a warning that the urlib library used is out of data, but no actual errors.

DanyStinson commented 2 years ago

Hi! It looks like when it is creating the custom vocabulary it's passing on to Transcribe incorrectly formatted words. I checked out the lambda function in charge of creating the vocabulary and in the example feed it is taking in the following terms

["-Hawn", "Cloud", "A-W-S-A-I-Services", "Amazon", "Code-Whisperer", "Pillir", "A-W-S", "S-A-P", "U-K-T-V", "Media-two-Cloud", "Marketplace", "Mainframe-Modernization", "E-M-R-Serverless", "E-M-R", "Apache", "Spark", "Hive", "Hawn", "Amazon-Connect", "Local-Measure", "low", "Intelligent-Automation"]

It seems that when creating a custom vocabulary "-word" (in this case -Hawn, is creating the issue) is not accepted, so the lambda function in charge of doing the preprocessing should be reviewed --> podcast-transcribe-index-createTranscribeVocabular***

Hope this helps!

yemaney commented 2 years ago

With @Dani Mitchells help, I found out the problem was caused because transcribe doesn't accept words starting with - when creating its custom vocabulary.

Solved it by adding an extra check for this case in the podcast-transcribe-index-createTranscribeVocabular function.

mapping[item] = origItem
#### check for words starting with '-'
if item[0] == "-":
        item = item[1:]
###
vocabularyTerms.append(item)

aws-samples / amazon-transcribe-comprehend-podcast

Fail state executed in step: Processing Error #12