elixir-webrtc / xav

Elixir wrapper over FFmpeg for reading and decoding audio and video data
Apache License 2.0
40 stars 4 forks source link
audio decoding elixir ffmpeg reading video

Xav

Hex.pm API Docs CI codecov

Elixir wrapper over FFmpeg for reading audio and video files.

See an interview with FFmpeg enthusiast: https://youtu.be/9kaIXkImCAM

Installation

Make sure you have installed FFMpeg (ver. 4.x - 7.x) development packages on your system (see here for installation one-liners) and add Xav to the list of your dependencies:

def deps do
  [
    {:xav, "~> 0.8.0"},
    # Add Nx if you want to have Xav.Frame.to_nx/1
    {:nx, ">= 0.0.0"}
  ]
end

Usage

Decode

decoder = Xav.Decoder.new(:vp8)
{:ok, %Xav.Frame{} = frame} = Xav.Decoder.decode(decoder, <<"somebinary">>)

Decode with audio resampling

decoder = Xav.Decoder.new(:opus, out_format: :f32, out_sample_rate: 16_000)
{:ok, %Xav.Frame{} = frame} = Xav.Decoder.decode(decoder, <<"somebinary">>)

Read from a file:

r = Xav.Reader.new!("./some_mp4_file.mp4")
{:ok, %Xav.Frame{} = frame} = Xav.Reader.next_frame(r)
tensor = Xav.Frame.to_nx(frame)
Kino.Image.new(tensor)

Read from a camera:

r = Xav.Reader.new!("/dev/video0", device?: true)
{:ok, %Xav.Frame{} = frame} = Xav.Reader.next_frame(r)
tensor = Xav.Frame.to_nx(frame)
Kino.Image.new(tensor)

Speech to text:

{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-tiny"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-tiny"})

serving =
  Bumblebee.Audio.speech_to_text_whisper(whisper, featurizer, tokenizer, generation_config,
    defn_options: [compiler: EXLA]
  )

# Read a couple of frames.
# See https://hexdocs.pm/bumblebee/Bumblebee.Audio.WhisperFeaturizer.html for default sampling rate.
frames =
    Xav.Reader.stream!("sample.mp3", read: :audio, out_format: :f32, out_channels: 1, out_sample_rate: 16_000)
    |> Stream.take(200)
    |> Enum.map(fn frame -> Xav.Frame.to_nx(frame) end)

batch = Nx.Batch.concatenate(frames)
batch = Nx.Defn.jit_apply(&Function.identity/1, [batch])
Nx.Serving.run(serving, batch) 

Development

To make clangd aware of the header files used in your project, you can create a compile_commands.json file. clangd uses this file to know the compiler flags, include paths, and other compilation options for each source file.

Install bear

The easiest way to generate compile_commands.json from a Makefile is to use the bear tool. bear is a tool that records the compiler calls during a build and creates the compile_commands.json file.

You can install bear with your package manager:

Generate compile_commands.json

After installing bear, you can run it alongside your make command to capture the necessary information.

bear -- mix compile