bkacjios / lua-mumble

A lua module to connect to a mumble server and interact with it
MIT License
23 stars 4 forks source link

Question: How to capture incoming audio stream? #12

Closed hbeni closed 4 years ago

hbeni commented 4 years ago

Hello, i'm not sure if i understand the API correctly, but it seems there is currently no way to capture audio streams form other users?

I want to build some recording bot that stores incoming audio to a file on disk, suitable for later playback trough lua-mumble (background is fgcom-mumble, i need this for the ATIS recording feature). How can i do that?

It is impotant to somehow beeing able to distinguish between sending users, because there might be several users recording ATIS messages in parallel.

bkacjios commented 4 years ago

Currently audio capture isn't supported, only playback is. I could potentially add the audio data in the OnUserSpeak as encoded Opus audio data for you do do with as you please.

hbeni commented 4 years ago

That would be great! Is it much work?

bkacjios commented 4 years ago

Not really. Already committed this for you in 27629a76a50a70e2cf9c83b6b58116cc3c87ef97

hbeni commented 4 years ago

Wow cool! I will try it out soon!

hbeni commented 4 years ago

Well, now i'm stuck again, sorry to bother you again. I have the samples right in the file, but how do i playback this? mumble.client:play("test.opus", 1.0, 1) does segfault (probably because its no ogg data....)

code is this, maybe i do something wrong:

client:hook("OnUserStartSpeaking", function(user)
    print("OnUserStartSpeaking, user["..user:getID().."]="..user:getName())

    fgcom_voicebuffer.out = assert(io.open("test.opus", "wb"))
end)

client:hook("OnUserSpeak", function(event)
    print("OnUserSpeak, from=["..event.user:getID().."] '"..event.user:getName().."'")
    print("  codec="..event.codec)
    print("  target="..event.target)
    print("  sequence="..event.sequence)

    fgcom_voicebuffer.out:write(event.data)
end)

client:hook("OnUserStopSpeaking", function(user)
    print("OnUserStopSpeaking, user["..user:getID().."]="..user:getName())

    print("TEST close FH")
    assert(fgcom_voicebuffer.out:close())

    print("TEST PLAY")
    mumble.client:play("test.opus", 1.0, 1)
    print("TEST PLAY DONE")
end)
bkacjios commented 4 years ago

The play method is only for vorbis encoded ogg files which then get encoded into opus, so it's probably segfaulting since it's trying to encode already encoded data. You would have to somehow convert the audio data back into a playable ogg file for it to work. I could maybe add a method for you to play back the raw encoded data yourself. Something like mumble.client:transmit(String encoded data) but this would need to be done in a timer to continuously stream the audio.

Still, I guess I should fix the play method segfaulting and have it spit out an error.

hbeni commented 4 years ago

My goal is to store the audio data into a file so i can feed it to another bot that will continously play it

bkacjios commented 4 years ago

I'll take a look at how the mumble client records audio and see if there's anything I can learn from that. I don't think there's enough stuff implemented in this library for you to get this working at the moment.

hbeni commented 4 years ago

Thank you!

bkacjios commented 4 years ago

Okay, so I added a bunch of stuff that should be enough to get you something resembling what you requested. Warning though, this is a little complicated, and there may be some issues or bugs. I don't have time to debug and test to make sure it works, but it should a good enough of a start to work with.

client:hook("OnUserStartSpeaking", function(user)
    print("OnUserStartSpeaking, user["..user:getID().."]="..user:getName())

    fgcom_voicebuffer.out = assert(io.open("test.rec", "wb"))
end)

local CODEC_OPUS = 4

local bit = require("bit")

local function writeShort(f, short)
    -- Convert our 16 bit number into two bytes
    local b1 = bit.band(bit.rshift(short, 8), 0xFF)
    local b2 = bit.band(short, 0xFF)
    f:write(string.char(b1, b2))
end

local function readShort(f)
    local short = f:read(2) -- Read two characters from the file
    if not short or short == "" then return end -- End of file
    local b1, b2 = string.byte(short, 1, 2) -- Convert the two characters to bytes
    return bit.bor(bit.lshift(b1, 8), bit.lshift(b2, 0)) -- Combine the two bytes into a number
end

local decoder = mumble.decoder()

client:hook("OnUserSpeak", function(event)
    if event.codec ~= CODEC_OPUS then return end -- Only supports OPUS voice data..

    print("OnUserSpeak, from=["..event.user:getID().."] '"..event.user:getName().."'")
    print("  codec="..event.codec)
    print("  target="..event.target)
    print("  sequence="..event.sequence)

    local pcm = decoder:decode_float(event.data) -- Decode the encoded voice data back to usable PCM

    fgcom_voicebuffer.out:writeShort(#pcm) -- Store the size of the audio frame so we know how much to read from the file later
    fgcom_voicebuffer.out:write(pcm) -- Save the entire PCM data
end)

client:hook("OnUserStopSpeaking", function(user)
    print("OnUserStopSpeaking, user["..user:getID().."]="..user:getName())

    print("TEST close FH")
    assert(fgcom_voicebuffer.out:close())

    client:playRecording("test.rec")
end)

local encoder = mumble.encoder()

function mumble.client:playRecording(file)
    local f = assert(io.open(file, "rb"))

    local timer = mumble.timer()

    timer:start(function(t)
        if f then
            local len = readShort(f)
            local pcm = f:read(len)

            if not pcm or pcm == "" then
                t:stop() -- Stop the audio timer
                f:close()
                return
            end

            local encoded = encoder:encode_float(1, pcm)
            client:transmit(encoded) -- Transmit the single frame as an audio packet
        end
    end, 0.01, 0.01) -- Create a timer that will loop every 10ms
end
hbeni commented 4 years ago

Thank you very much for your effort! Just to be sure: mumble.client:transmit does transmit to the channel, so everyone (remote) on that channel can hear it, right?

bkacjios commented 4 years ago

Yeah, that's how it works. It will also use the active voice target set by mumble.client:setVoiceTarget(Number id) as well. Also, I think the timer needs to be 0.01 rather than 0.1

hbeni commented 4 years ago

Hello again,

i played with the bot code but still struggle with it.

Code following, then a shortend output:

client:hook("OnUserStartSpeaking", function(user)
    print("OnUserStartSpeaking, user["..user:getID().."]="..user:getName())

    print("open file test.rec")
    fgcom_voicebuffer.out = assert(io.open("test.rec", "wb"))
end)

local CODEC_OPUS = 4

local bit = require("bit")

local function writeShort(f, short)
    -- Convert our 16 bit number into two bytes
    local b1 = bit.band(bit.rshift(short, 8), 0xFF)
    local b2 = bit.band(short, 0xFF)
    f:write(string.char(b1, b2))
end

local function readShort(f)
    local short = f:read(2) -- Read two characters from the file
    if not short or short == "" then return end -- End of file
    local b1, b2 = string.byte(short, 1, 2) -- Convert the two characters to bytes
    return bit.bor(bit.lshift(b1, 8), bit.lshift(b2, 0)) -- Combine the two bytes into a number
end

local decoder = mumble.decoder()

client:hook("OnUserSpeak", function(event)
    if event.codec ~= CODEC_OPUS then return end -- Only supports OPUS voice data..

    print("OnUserSpeak, from=["..event.user:getID().."] '"..event.user:getName().."'")
    print("  codec="..event.codec)
    print("  target="..event.target)
    print("  sequence="..event.sequence)

    bitrate = decoder:getBitRate()
    print(" decoder decoding at "..bitrate)
    local pcm = decoder:decode_float(event.data) -- Decode the encoded voice data back to usable PCM
print("OK1")
--    fgcom_voicebuffer.out:writeShort(#pcm) -- Store the size of the audio frame so we know how much to read from the file later
      writeShort(fgcom_voicebuffer.out, #pcm) -- Store the size of the audio frame so we know how much to read
      --print("OK2")
    fgcom_voicebuffer.out:write(pcm) -- Save the entire PCM data
    print("wrote pcm to file ("..#pcm.."b)")
end)

client:hook("OnUserStopSpeaking", function(user)
    print("OnUserStopSpeaking, user["..user:getID().."]="..user:getName())

    print("TEST close FH")
    assert(fgcom_voicebuffer.out:close())
    print("TEST close FH OK")

    client:playRecording("test.rec")
end)

local encoder = mumble.encoder()
--encoder:setBitRate(decoder:getBitRate())
--encoder:setBitRate(48000)

function mumble.client:playRecording(file)
    local f = assert(io.open(file, "rb"))
    print("file "..file.." opened")

    local timer = mumble.timer()
    print("timer initialized")

    local seq = 0

    timer:start(function(t)
        if f then
            print("timer: read packet "..seq)
            seq = seq+1
            local len = readShort(f)
            print("timer:   header read ok, packet_len="..len)
            local pcm = f:read(len)

            print("timer:   data read ok")

            if not pcm or pcm == "" then
                print("timer: stop timer")
                t:stop() -- Stop the audio timer
                f:close()
                return
            end

            print("timer: encode and transmit")
            bitrate = encoder:getBitRate()
            print(" encoder encoding at "..bitrate)
            local encoded = encoder:encode_float(1, pcm) -- encode PCM packet to 1 opus frame
            print("timer:   encoded ok")
            client:transmit(encoded) -- Transmit the single frame as an audio packet
            print("timer:   transmit ok")
        end
    end, 0.01, 0.01) -- Create a timer that will loop every 10ms
end

Log:

OnUserStartSpeaking, user[0]=3
open file test.rec
OnUserSpeak, from=[0] '3'
  codec=4
  target=0
  sequence=0
 decoder decoding at 21996
OK1
wrote pcm to file (960b)
OnUserSpeak, from=[0] '3'
  codec=4
  target=0
  sequence=2
 decoder decoding at 21996
OK1
wrote pcm to file (960b)
OnUserSpeak, from=[0] '3'
  codec=4
  target=0
  sequence=4
 decoder decoding at 21996
OK1
wrote pcm to file (960b)

[... many more ...]

OnUserStopSpeaking, user[0]=3
TEST close FH
TEST close FH OK
file test.rec opened
timer initialized
timer: read packet 0
timer:   header read ok, packet_len=960
timer:   data read ok
timer: encode and transmit
 encoder encoding at 96000
timer:   encoded ok
timer:   transmit ok
timer: read packet 1
timer:   header read ok, packet_len=960
timer:   data read ok
timer: encode and transmit
 encoder encoding at 96000
timer:   encoded ok
timer:   transmit ok

[... more transmit lines ...]

timer: read packet 26
timer:   header read ok, packet_len=960
timer:   data read ok
timer: encode and transmit
 encoder encoding at 96000
timer:   encoded ok
timer:   transmit ok
timer: read packet 27
Speicherzugriffsfehler
hbeni commented 4 years ago

Note: Don't get me wrong, if the opus packet works (like it does with my little fixes above) this is perfectly fine. My last poblem is just the segfault, because it takes down the bot (who is supposed to run in all eternity)

bkacjios commented 4 years ago

Could you get me a stack trace of the crash. First build mumble.so with debug symbols. make clean && make debug

Install gdb and run your bot via gdb --args lua yourscript.lua

When it crashes you can enter the command bt to see exactly where in the code it had crashed.

hbeni commented 4 years ago

Thanks for your help! I narrowed it down and found the cause: readShort(f) to get the header fails but subsequent calls try to use it; the trick is to also check if that read was OK, and if not stop the timer (like at the pcm code part).

Ah btw; a few hours ago i commented the source code somewhere about an issue with getID() alwas reporting 0. That was my fault, i confused it for getSession(), which works as designed. getID gets the registered id, which is 0 for unregistered users!

This lib is really nice! Top! This ticket is done i think :)

bkacjios commented 4 years ago

Ah okay, makes sense, but it's weird it wasn't erroring and crashing instead.

But yeah, the timer length that is needed may be different per person talking just an FYI. It can vary from 0.01, 0.02, 0.04, and 0.06

This is customizable by the client in the settings UI. You may be able to detect the timer size needed by the size of the decompressed PCM data, but I'm not entirely sure.

Compression Delay

hbeni commented 4 years ago

Hey @bkacjios I played a little more and here is a simple echo bot implementation with in-memory storage; with many comments. I made it so you can include it in the examples section :)

--[[  This is a simple echo bot, showing how to capture and replay some samples. ]]

-- Define a voicebuffer to store the recording samples.
-- We treat this table as fifo queue containing the recorded samples.
-- We can just use the (OO-adapted) code from https://www.lua.org/pil/11.4.html
FiFo = {}
function FiFo:new (o)
    o = o or {}   -- create object if user does not provide one
    o.first = 0
    o.last = -1
    setmetatable(o, self)
    self.__index = self
    return o
end
function FiFo:pushleft (value)
    local first = self.first - 1
    self.first = first
    self[first] = value
end
function FiFo:pushright (value)
    local last = self.last + 1
    self.last = last
    self[last] = value
end
function FiFo:popleft ()
    local first = self.first
    if first > self.last then return nil end
    local value = self[first]
    self[first] = nil        -- to allow garbage collection
    self.first = first + 1
    return value
end
function FiFo:popright ()
    local last = self.last
    if self.first > last then return nil end
    local value = self[last]
    self[last] = nil         -- to allow garbage collection
    self.last = last - 1
    return value
end

-- finally, initialize our voicebuffer using the FiFo prototype
local voiceBuffer = FiFo:new()

-- Protocol constant for the codec. Currently only OPUS
-- encoded packets are supported (experimentation could yield that
-- other packets might work too, as we are just replaying them)
local CODEC_OPUS = 4

mumble = require("mumble")  -- get the mumble API

--[[
   It is nice if the bot can be called with parameters from the outside:
   lua echobot.lua --host=someHost --cert=mycert.pem --key=mykey.key

   The cert and key can be generated with openssl like this:
     $> openssl genrsa -out bot.key 2048 2> /dev/null
     $> openssl req -new -sha256 -key bot.key -out bot.csr -subj "/"
     $> openssl x509 -req -in bot.csr -signkey bot.key -out bot.pem 2> /dev/null
]]

-- define defaults
local botname = "echobot"
local host    = "localhost"
local port    = 64738      -- standard mumble port
local cert    = "bot.pem"
local key     = "bot.key"

-- Parse cmdline args
if arg[1] then
    if arg[1]=="-h" or arg[1]=="--help" then
        print(botname)
        print("usage: "..arg[0].." [opt=val ...]")
        print("  opts:")
        print("    --host=    host to coennct to")
        print("    --port=    port to connect to")
        print("    --cert=    path to PEM encoded cert")
        print("    --key=     path to the certs key")
        os.exit(0)
    end

    for _, opt in ipairs(arg) do
        _, _, k, v = string.find(arg[1], "--(%w+)=(.+)")
        print("KEY='"..k.."'; VAL='"..v.."'")
        if k=="host" then host=v end
        if k=="port" then port=v end
        if k=="cert" then cert=v end
        if k=="key" then  key=v end
    end

end

-- Connect to server, so we get the API
print(botname..": connecting to "..host.." on port "..port.." (cert: "..cert.."; key: "..key..")")
local client = assert(mumble.connect(host, port, cert, key))
client:auth(botname)
print("connect and bind: OK")

--[[
  Playback loop: we use a mumble timer for this. The timer loops in
  the playback-rate and looks if there are samples buffered. If so,
  he fetches them and plays them, one packet per timer tick.
]]
local playbackTimer_rate = 0.02 -- playback speed: it can vary from 0.01, 0.02, 0.04, and 0.06 and is subject to user client settings ("Audio per packet")
local playbackTimer = mumble.timer()
playbackTimer:start(function(t)
    -- get the next sample from the buffer and play it
    nextSample = voiceBuffer:popleft()
    if nextSample then
        print("transmit next sample")
        client:transmit(nextSample)  -- Transmit the single frame as an audio packet (the bot "speaks")
    end
end, 0.00, playbackTimer_rate)

--[[
  Define mumble hooks to collect the samples
]]

-- The hook is called whenever someone speaks.
-- We record the samples into the buffer.
client:hook("OnUserSpeak", function(event)
    print("OnUserSpeak, codec="..event.codec.." from=["..event.user:getSession().."] '"..event.user:getName().."'")

    if event.codec ~= CODEC_OPUS then 
        print("ERROR: Only CODEC_OPUS is supported for now!")
        return -- Only supports OPUS voice data... -> ignore other codecs
    end

    -- Now record the samples to the buffer
    len = #event.data
    print("  recording sample, len="..len)
    voiceBuffer:pushright(event.data)

end)

-- Done with setup, lets enter the bots main loop
mumble.loop()
bkacjios commented 4 years ago

Also, it's worth mentioning if you don't want to actually record the data and just straight up echo the voice data perfectly, you can just do this!

client:hook("OnUserSpeak", function(event)
    print("OnUserSpeak, codec="..event.codec.." from=["..event.user:getSession().."] '"..event.user:getName().."'")

    if event.codec ~= CODEC_OPUS then 
        print("ERROR: Only CODEC_OPUS is supported for now!")
        return -- Only supports OPUS voice data... -> ignore other codecs
    end

    -- Transmit the voice data we received immediately!
    client:transmit(event.data)
end)
bkacjios commented 4 years ago

I made one final change to allow it to echo any codec type. Transmit will now accept a codec. If you pass the speaking state as the final argument it will sound a little better when they stop talking as well, since it will properly set the continuation or endofstream bit. I was only able to test OPUS, but theoretically any codec should work now.

client:hook("OnUserSpeak", function(event)
    client:transmit(event.codec, event.data, event.speaking)
end)