aws / aws-sdk-js

AWS SDK for JavaScript in the browser and Node.js
https://aws.amazon.com/developer/language/javascript/
Apache License 2.0
7.59k stars 1.55k forks source link

Polly is truncating UTF-8 characters when getting speech marks. #1683

Closed joe1chen closed 5 years ago

joe1chen commented 7 years ago

I'm using an input with the character (3-byte UTF-8 E2 80 9D) followed by a period. When calling polly for speech marks json, the response that is returned has truncated the character and is only returning the first byte of the unicode character (E2) and dropping the last 2 bytes (80 9D).

To reproduce:

var Polly = new AWS.Polly({
    signatureVersion: 'v4'
});

var params = {
    'Text': "pools”.",
    'OutputFormat': 'json',
    'VoiceId': 'Joanna',
    'SpeechMarkTypes': ['word','sentence']
}

Polly.synthesizeSpeech(params, function (err, data) {
  console.log(data.AudioStream.toString('utf-8'));
}

Output:

{"time":0,"type":"sentence","start":0,"end":6,"value":"pools�"}
{"time":6,"type":"word","start":0,"end":5,"value":"pools"}

The end value, which refers to the end position in the UTF-8 byte stream, also show an end value of 6 in the sentence line, which is incorrect and also indicative that truncation is occurring.

The expected correct output should be:

{"time":0,"type":"sentence","start":0,"end":8,"value":"pools”"}
{"time":6,"type":"word","start":0,"end":5,"value":"pools"}
AllanZhengYP commented 7 years ago

Hi @joe1chen This issue seems like a Service/API issue, which means it does not arise from the SDK. I will forward it to Polly service team to see how we will solve this issue. For now, you can try passing in the readable words only.

Update: This is an issue with Polly service. The team is working on fixing it.

srchase commented 5 years ago

@joe1chen

The Polly Service Team made a fix for this quite a while ago. Closing this issue.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.