endoplasmic / google-assistant

A node.js implementation of the Google Assistant SDK
MIT License
284 stars 75 forks source link

Getting full text response when answer is broken in two messages #42

Open dataoracle opened 6 years ago

dataoracle commented 6 years ago

Hi, great job guys on the implementation of the google assistant service! loving it :)

I'm interested on both text and audio responses. I realized that certain types of question are populating the text in the response event either partially or not at all.

Examples I found so far:

I understand that the supplemental_display_text of the DialogStateOut is not meant to be always the full transcript of the audio response, but I was wondering if there is something that we can do to get the full jokes as text.

For those that are coming totally empty (like traffic related questions) I could use the google cloud speech API to do a STT of the audio_data, at the expense of some extra round time.

Any ideas guys around these two use cases? Thanks!!

endoplasmic commented 6 years ago

For traffic I seem to be getting text:

Type your request: What's the traffic like to work?
Assistant Response: On your way to work, traffic is light, as usual. It is twenty-eight minutes by car.
Conversation Complete

For jokes, it looks like I'm only getting the punchline maybe?

Type your request: Tell me a joke
Assistant Response: The king and queen of clubs 👑 ♣
Conversation Complete

What examples are you using that is causing you issues?

dataoracle commented 6 years ago

Hey @endoplasmic , thanks for your follow up.

We got intrigued by your traffic response coming properly, so we added the work location to the google account linked to the project and it is working properly as long as we phrase questions involving the work tag.

Before we were trying with questions like How is the traffic around XXX street? The audio answer comes ok, but the Assistant Response is empty. Can you give it a try from your end?

The other cases are jokes/riddles. For these we are getting either just the punchline (as in your sample above) or the answer to the riddle. In both cases the audio response is coming properly (full joke or riddle).

What we are trying to do is use the google assistant for a conversational bot that needs to work on a "text-only" channel as well as in a "voice-only" channel. Voice seems that will not be the problem, but for the text channel these cases will not work properly.

Any ideas?

Thanks!

endoplasmic commented 6 years ago

I'm seeing the same on my end regarding traffic. Blank text response. Sounds like a bug in the SDK.

If you wanted to you could always take the audio as it comes in and transcribe it via google speech. Start the request once you get the first bytes and you'll get the text streamed back to you.

Discussion started: https://plus.google.com/111323056508012159527/posts/7RP68D4WiVx

dataoracle commented 6 years ago

Ok, so it's consistent. We are exactly doing that, a STT using the google speech API, but it adds more latency and we have the problem of getting proper punctuation, that only works for en-US and not for en-UK at the moment.

I'll report back if there is any update or if we find a work around.

Thanks!

ghost commented 6 years ago

I raised this as a bug on the API a couple of months ago as it affects my Google Assistant for Alexa skill. https://github.com/googlesamples/assistant-sdk-python/issues/158

endoplasmic commented 6 years ago

@tartanguru - Thanks for the link. I've subscribed to the thread to watch any action that comes up.

endoplasmic commented 6 years ago

I check into this once and a while, and it does look like "tell me a joke" is fixed, but the traffic around a specific place is still busted.

pauleeeeee commented 4 years ago

@endoplasmic Two ideas for this issue: I am getting a lot of blank responses, even with screen: { isOn: false } so I decided to change the isOn flag to true to true and then parse the HTML to see what was being sent. I used the html-to-text library to do this. The result is a bit mangled, but it is certainly more verbose than what is offered by the simple text response when isOn is set to false. The other thing I looked into is the "debug" option referenced in the SDK. If the debug option is set to true (and some other conditions are met), then you are sent a full response object that contains the text that would have been converted to speech. Would you be able to allow this library to access the debuginfo flag? see https://developers.google.com/assistant/sdk/reference/rpc/google.assistant.embedded.v1alpha2#google.assistant.embedded.v1alpha2.DebugInfo

endoplasmic commented 4 years ago

I added it in commit: https://github.com/endoplasmic/google-assistant/commit/641edf80cccd0c98db23331bfe607c1abcb447d2

I have no way to test it (or what conditions need to be met) since I don't have any Actions on Google things. It seems that's what the field is for though.

Either way, it's good to support it, so thanks for pointing that out!