awsdocs / amazon-transcribe-developer-guide

The open source version of the Amazon Transcribe docs. You can submit feedback & requests for changes by submitting issues in this repo or by making proposed changes & submitting a pull request.
Other
25 stars 26 forks source link

Custom vocabulary - Japanese Language #17

Closed LovikaJain closed 3 years ago

LovikaJain commented 3 years ago

Not able to create custom vocabulary with Japanese words and nor able to find any sample of custom vocab phrases file. Tried character code from the table and the direct japanese words array of strings. Neither worked. Got the error "The vocabulary that you’re trying to create contains invalid characters or incorrectly formatted terms. See the developer guide for more information." Here is my code response = transcribe.create_vocabulary( VocabularyName = 'vocab2', LanguageCode = 'ja-JP', Phrases = ["0x3005 0x3005"] )

Any leads would be appreciated!

lisdelan commented 3 years ago

Hi, apologies for the delay in responding. Are you still having issues with ja-JP characters?

tsupox commented 3 years ago

I've checked the document below, there is a link to the file in this repository but I cannot find it. https://docs.aws.amazon.com/transcribe/latest/dg/charsets.html#char-japanese

Japanese character set For Japanese custom vocabularies, the Phrase and DisplayAs fields can use any of the characters listed in the following file on GitHub.

ja-jp-character-set.txt

The link is to https://github.com/awsdocs/amazon-transcribe-developer-guide/blob/master/doc_source/ja-jp-character-set.txt

I've checked the history and found out this file has deleted in https://github.com/awsdocs/amazon-transcribe-developer-guide/commit/c8d37e17f0e2a8b677e34a29192c34d59dacd540

I'd like to know this link is still available or not.

Thank you.

lisdelan commented 3 years ago

Hi, please try that link again; I've just updated the documentation, including the character sets.