bnosac / audio.vadwebrtc

Voice Activity Detection in R using the "webrtc" toolkit
Mozilla Public License 2.0
1 stars 1 forks source link

audio.vadwebrtc

This repository contains an R package which is an Rcpp wrapper around the webrtc Voice Activity Detection module.

https://github.com/bnosac/audio.vadwebrtc/assets/1710810/6086999a-6348-460a-8fd6-2ab9fd2a5d17

The package was created with as main goal to remove non-speech audio segments before doing an automatic transcription using audio.whisper to avoid transcription hallucinations. It contains

Installation

Look to the documentation of the functions: help(package = "audio.vadwebrtc")

Example

Get a audio file in 16 bit with mono PCM samples (pcm_s16le codec) with a sampling rate of either 8Khz, 16KHz or 32Khz

library(audio.vadwebrtc)
file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav")
vad  <- VAD(file, mode = "normal")
vad
Voice Activity Detection 
  - file: D:/Jan/R/win-library/4.1/audio.vadwebrtc/extdata/test_wav.wav 
  - sample rate: 16000 
  - VAD type: webrtc-gmm, VAD mode: normal, VAD by milliseconds: 10, VAD frame_length: 160
    - Percent of audio containing a voiced signal: 90.2% 
    - Seconds voiced: 6.3 
    - Seconds unvoiced: 0.7
vad$vad_segments
 vad_segment start  end has_voice
           1  0.00 0.08     FALSE
           2  0.09 3.30      TRUE
           3  3.31 3.71     FALSE
           4  3.72 6.78      TRUE
           5  6.79 6.99     FALSE

Example of a simple plot of these audio and voice segments

library(av)
x <- read_audio_bin(file)
plot(seq_along(x) / 16000, x, type = "l", xlab = "Seconds", ylab = "Signal")
abline(v = vad$vad_segments$start, col = "red", lwd = 2)
abline(v = vad$vad_segments$end, col = "blue", lwd = 2)

Or show it interactively alongside R package wavesurfer: wavesurfer

library(wavesurfer)
library(shiny)
file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav")
vad  <- VAD(file, mode = "lowbitrate")
anno <- data.frame(audio_id = vad$file, 
                   region_id = vad$vad_segments$vad_segment, 
                   start = vad$vad_segments$start, 
                   end = vad$vad_segments$end, 
                   label = ifelse(vad$vad_segments$has_voice, "Voiced", "Silent"))
anno <- subset(anno, label %in% "Silent")

wavs_folder <- system.file(package = "audio.vadwebrtc", "extdata")
shiny::addResourcePath("wav", wavs_folder)
ui <- fluidPage(
  wavesurferOutput("my_ws", height = "128px"),
  tags$p("Press spacebar to toggle play/pause."),
)
server <- function(input, output, session) {
  output$my_ws <- renderWavesurfer({
    wavesurfer(audio = paste0("wav/", "test_wav.wav"), annotations = anno) %>%
      ws_set_wave_color('#5511aa') %>%
      ws_cursor()
  })
}
shinyApp(ui = ui, server = server)

Support in text mining

Need support in text mining? Contact BNOSAC: http://www.bnosac.be