alexa / avs-device-sdk

An SDK for commercial device makers to integrate Alexa directly into connected products.
https://developer.amazon.com/alexa/alexa-voice-service
Apache License 2.0
1.26k stars 603 forks source link

Question about AIP (Audio Input Processor) #701

Closed PavelYarysh closed 6 years ago

PavelYarysh commented 6 years ago

Recently my microphone got broken, and I was wondering is there a way to send messages to Alexa AVS cloud using already existing databases of recorded messages? I am not sure how the process works: the program records mic data in wav form, analyses it on the computer using ASR and only then sends it to the cloud or streams data directly to the cloud? I want to modify Alexa so that after some action (like typing a button) the program instead of recording from mic, pulls out the needed wav message to the processor. Is there a way to do that?

priyagsenthil commented 6 years ago

Hi PavelYarysh,

Yes you can refer to some of our integration tests where we feed pre-recorded audio files to the AIP. One think to note when creating the audio files - the voice recognition works best with human voice and then robot/machine generated audio.

Thanks Priya

PavelYarysh commented 6 years ago

Hi Priya, Thank you, I am trying to see, how I can modify them to do my job.

PavelYarysh commented 6 years ago

Hi Priya, I got very confused after looking at integration tests, they seem to recreate the whole system and initialize everything from the beginning, but I want only to add an option to the existing program. So in the main menu when I enter the 'o' letter for example, the program will use pre-recorded audio, but then continue working normally.

priyagsenthil commented 6 years ago

Hi PavelYarysha,

Take a look at AudioInputProcessorIntegrationTest.cpp for the test holdToTalkJoke. This test reads a raw audio file and feeds it into the SDS buffer (used by the AIP).

To feed raw audio data make a change in SampleApp/SampleApplication.cpp. This creates the following data buffer into which the portAudioWrapper writes into. For your usecase write the audio file (PCM only) into the shared data stream and it should work. Hope this helps.

auto buffer = std::make_shared(bufferSize); std::shared_ptr sharedDataStream = alexaClientSDK::avsCommon::avs::AudioInputStream::create(buffer, WORD_SIZE, MAX_READERS);

if (!sharedDataStream) {
    ACSDK_CRITICAL(LX("Failed to create shared data stream!"));
    return false;
}
PavelYarysh commented 6 years ago

Thank you very much, and then in this case the future voice inputs would be disabled?

priyagsenthil commented 6 years ago

Yes make sure you cut down the write from the PortAudioWrapper to close down voice inputs.

PavelYarysh commented 6 years ago

Thank you

lenhattu commented 5 years ago

Hi PavelYarysha,

Take a look at AudioInputProcessorIntegrationTest.cpp for the test holdToTalkJoke. This test reads a raw audio file and feeds it into the SDS buffer (used by the AIP).

To feed raw audio data make a change in SampleApp/SampleApplication.cpp. This creates the following data buffer into which the portAudioWrapper writes into. For your usecase write the audio file (PCM only) into the shared data stream and it should work. Hope this helps.

auto buffer = std::make_sharedalexaClientSDK::avsCommon::avs::AudioInputStream::Buffer(bufferSize); std::shared_ptralexaClientSDK::avsCommon::avs::AudioInputStream sharedDataStream = alexaClientSDK::avsCommon::avs::AudioInputStream::create(buffer, WORD_SIZE, MAX_READERS);

if (!sharedDataStream) {
    ACSDK_CRITICAL(LX("Failed to create shared data stream!"));
    return false;
}

Hi Priya, by writing the audio file to the data stream at this location (SampleApp::initialize), it will be called before m_userInputManager->run(). That means AVS will get the audio input from file and process before user inputs such as tap, hold, etc.???

What if I want to add a custom input option here in UserInputManager for reading from audio file instead of voice?

SampleAppReturnCode UserInputManager::run() {
    bool userTriggeredLogout = false;
    m_interactionManager->begin();
    while (true) {
        char x;
        if (!readConsoleInput(&x)) {
            break;
        }
        x = ::tolower(x);
        if (x == QUIT) {
            break;
        } else if (x == RESET) {
            if (confirmReset()) {
                userTriggeredLogout = true;
            }
        } else if (x == REAUTHORIZE) {
            confirmReauthorizeDevice();
        } else if (x == MIC_TOGGLE) {
            m_interactionManager->microphoneToggle();
        } else if (x == STOP) {
            m_interactionManager->stopForegroundActivity();
        } else if (x == SPEAKER_CONTROL) {
            controlSpeaker();
#ifdef ENABLE_PCC
        } else if (x == PHONE_CONTROL) {
            controlPhone();
#endif
        } else if (x == SETTINGS) {
            settingsMenu();
        } else if (x == INFO) {
            if (m_limitedInteraction) {
                m_interactionManager->limitedHelp();
            } else {
                m_interactionManager->help();
            }
        } else if (m_limitedInteraction) {
            m_interactionManager->errorValue();
            // ----- Add a new interaction bellow if the action is available only in 'unlimited interaction mode'.
        } else if (x == HOLD) {
            m_interactionManager->holdToggled();
        } else if (x == TAP) {
            m_interactionManager->tap();
        } else if (x == PLAY) {
            m_interactionManager->playbackPlay();
        } else if (x == PAUSE) {
            m_interactionManager->playbackPause();
...

I see it calls InteractionManager and then DefaultClient but I'm not sure how/where to put my audio file in.