bnosac / audio.whisper

Transcribe audio files using the "Whisper" Automatic Speech Recognition model from R
Other
113 stars 13 forks source link

integrate with audio.vadwebrtc #50

Closed jwijffels closed 6 months ago

jwijffels commented 8 months ago

Allow to transcribe a set of audio segments. Pass in multiple offset / durations.

jwijffels commented 7 months ago

Test with

library(audio.whisper)
download.file("https://github.com/jwijffels/example/raw/main/example.wav", "example.wav")
path  <- system.file(package = "audio.whisper", "repo", "ggml-tiny.en-q5_1.bin")
model <- whisper(path)
trans <- predict(model, newdata = "example.wav", language = "en", 
                 offset = c(0, 50000), duration = c(5000, 2000), trace = FALSE)
trans
trans$data
library(audio.whisper)
library(audio.vadwebrtc)
audio <- system.file(package = "audio.whisper", "samples", "stereo.wav")
## Voice activity detection
vad    <- VAD(audio)
voiced <- is.voiced(vad, units = "milliseconds", silence_min = 1000, voiced_min = 1000)
voiced <- subset(voiced, has_voice == TRUE)
## Transcription of voiced segments
path  <- system.file(package = "audio.whisper", "repo", "ggml-tiny.en-q5_1.bin")
model <- whisper(path)
trans <- predict(model, newdata = audio, language = "auto", sections = voiced, trace = FALSE)
trans
trans$data