getnamo / TensorFlow-Unreal

TensorFlow plugin for the Unreal Engine.
Other
1.15k stars 211 forks source link

Making Microphone to be "Voice Activated" instead push-to-talk #27

Open JuhaOjala opened 6 years ago

JuhaOjala commented 6 years ago

Hello! This is not actually an Issue, but an request for help. I activated your plugin and added the TFAudioCapture component to player character.

Then in Beginplay I binded the delegate, convert the raw binary to Wav file and finally convert the Wav to USoundWave and play the thing. This allows me to play the sound that was just recorded playsound

Here is the .CPP code for the Wav to USoundWave conversion if anybody needs:

USoundWave* AAudioCaptureSimpleCharacter::GetSoundWaveFromRawWav(TArray<uint8> Bytes)
{
    USoundWave* sw = NewObject<USoundWave>(USoundWave::StaticClass());`

    if (!sw)
        return nullptr;

    TArray < uint8 > rawFile;
    rawFile = Bytes;
    //FFileHelper::LoadFileToArray(rawFile, filePath.GetCharArray().GetData());
    FWaveModInfo WaveInfo;

    if (WaveInfo.ReadWaveInfo(rawFile.GetData(), rawFile.Num()))
    {
        sw->InvalidateCompressedData();

        sw->RawData.Lock(LOCK_READ_WRITE);
        void* LockedData = sw->RawData.Realloc(rawFile.Num());
        FMemory::Memcpy(LockedData, rawFile.GetData(), rawFile.Num());
        sw->RawData.Unlock();

        int32 DurationDiv = *WaveInfo.pChannels * *WaveInfo.pBitsPerSample * *WaveInfo.pSamplesPerSec;
        if (DurationDiv)
        {
            sw->Duration = *WaveInfo.pWaveDataSize * 8.0f / DurationDiv;
        }
        else
        {
            sw->Duration = 0.0f;
        }
        sw->SampleRate = *WaveInfo.pSamplesPerSec;
        sw->NumChannels = *WaveInfo.pChannels;
        sw->RawPCMDataSize = WaveInfo.SampleDataSize;
        sw->SoundGroup = ESoundGroup::SOUNDGROUP_Default;

    }
    else {
        return nullptr;
    }

    return sw;
}
USoundWave* AAudioCaptureSimpleCharacter::GetSoundWaveFromRawWav(TArray<uint8> Bytes)
{
    USoundWave* sw = NewObject<USoundWave>(USoundWave::StaticClass());

    if (!sw)
        return nullptr;

    TArray < uint8 > rawFile;
    rawFile = Bytes;
    //FFileHelper::LoadFileToArray(rawFile, filePath.GetCharArray().GetData());
    FWaveModInfo WaveInfo;

    if (WaveInfo.ReadWaveInfo(rawFile.GetData(), rawFile.Num()))
    {
        sw->InvalidateCompressedData();

        sw->RawData.Lock(LOCK_READ_WRITE);
        void* LockedData = sw->RawData.Realloc(rawFile.Num());
        FMemory::Memcpy(LockedData, rawFile.GetData(), rawFile.Num());
        sw->RawData.Unlock();

        int32 DurationDiv = *WaveInfo.pChannels * *WaveInfo.pBitsPerSample * *WaveInfo.pSamplesPerSec;
        if (DurationDiv)
        {
            sw->Duration = *WaveInfo.pWaveDataSize * 8.0f / DurationDiv;
        }
        else
        {
            sw->Duration = 0.0f;
        }
        sw->SampleRate = *WaveInfo.pSamplesPerSec;
        sw->NumChannels = *WaveInfo.pChannels;
        sw->RawPCMDataSize = WaveInfo.SampleDataSize;
        sw->SoundGroup = ESoundGroup::SOUNDGROUP_Default;

    }
    else {
        return nullptr;
    }

    return sw;
}

Now, I tried to see what value the bytes are that are coming from the OnAudioData Array, but when ever I try to do something with them, the whole game freezes. I guess it's because the data is constantly coming through and if something is done with the data, it never stops and computer chokes.

Could you point out what would be procedure to start doing the constantly listening microphone? You gave some advice for the subject in Unreal Forum, but I didn't quite grasp what you meant. Thank you for your time.

getnamo commented 6 years ago

It should be a matter of listening to the OnAudioData, it calls back on the game thread and the buffer is copied, but it's possible it might get overwritten again before you finish working on the game thread, that would be a bug. An easy fix may be to uncomment these three lines: https://github.com/getnamo/tensorflow-ue4/blob/master/Source/TFAudioCapture/Private/FTFAudioCapture.cpp#L45, https://github.com/getnamo/tensorflow-ue4/blob/master/Source/TFAudioCapture/Private/FTFAudioCapture.cpp#L46, and https://github.com/getnamo/tensorflow-ue4/blob/master/Source/TFAudioCapture/Private/FTFAudioCapture.cpp#L57 which would make you receive those blueprint calls on the sound thread. As long as you don't create or destroy UObjects on those callbacks it will work fine.

Regarding your earlier question of only triggering when you hear voice, a simple way to determine that is to listen to the average volume of audio and trigger if it's high enough. E.g. averaging the absolute values of each byte for the whole array of data you receive OnAudioData, then if the absolute average is above a certain threshold-> send the bytes, at that point start a timeout which resets each time you continue breaching the threshold for listening, if the timeout reaches the end, stop streaming the bytes until it get's re-triggered again.