SteveLillis / .NET-Ogg-Vorbis-Encoder

Ogg Vorbis audio encoding library written in C#
MIT License
69 stars 16 forks source link

Part of the beginning is being dropped when encoding raw PCM audio #12

Open ict-ryahata opened 3 years ago

ict-ryahata commented 3 years ago

I'm currently trying to use this library to encode raw PCM audio.

When comparing the encoded ogg file to the uncompressed wav file I noticed that they were not the same duration (not size); the ogg file's duration is shorter. The sample rate of the ogg file and wav file are the same.

At first I was worried that I was not using the library correctly even though I pretty much mirrored the example script linked on the github main page for this repository. However, I called the code from that example that generates a sound file with a sine wave of a specified amplitude and noticed that the encoded ogg files do not match the duration I passed in (supposed to be 3 seconds but the actual encoded file is approximately 2.972 seconds missing 1228 samples).

Is this expected behavior?

165plo commented 2 years ago

I know this issue is on the older side but I am seeing the same behavior when I am using it. I am decoding an ogg using NVorbis and when I attempt to Encode the file it is missing some of the data from the beginning of the audio.

cabauman commented 2 years ago

Sorry for the late reply (just started a new job).

I help maintain this project but unfortunately low level encoding logic isn't my specialty. When I get some free time I'll at least try to reproduce it, and then we'll see from there.

mysteryjeans commented 1 year ago

To make things easy here is a simple wrapper. It doesn't change the sample rate.

public class VorbisEncoder
{
    private readonly int _sampleRate;
    private readonly int _bits;
    private readonly int _channels;
    private readonly int _sampleSize;
    private readonly int _sampleFrame;
    private readonly Stream _outputStream;
    private readonly OggStream _oggStream;
    private readonly ProcessingState _processingState;

    public VorbisEncoder(int sampleRate, int bits, int channels, Stream outputStream)
    {
        Guard.CheckNull(outputStream, nameof(outputStream));

        if (sampleRate <= 0)
            throw new ArgumentOutOfRangeException(nameof(sampleRate), "Sample rate should greater then zero");

        if (bits != 8 && bits != 16)
            throw new ArgumentOutOfRangeException(nameof(bits), "Expected bits range is 8 or 16");

        if (channels <= 0)
            throw new ArgumentOutOfRangeException(nameof(bits), "Channels should be 1 or more");

        // Stores all the static vorbis bitstream settings
        var info = VorbisInfo.InitVariableBitRate(channels, sampleRate, 0.5f);

        // set up our packet->stream encoder
        var serial = new Random().Next();
        _oggStream = new OggStream(serial);

        // =========================================================
        // HEADER
        // =========================================================
        // Vorbis streams begin with three headers; the initial header (with
        // most of the codec setup parameters) which is mandated by the Ogg
        // bitstream spec.  The second header holds any comment fields.  The
        // third header holds the bitstream codebook.
        var comments = new Comments();
        comments.AddTag("ARTIST", "TTS");

        var infoPacket = HeaderPacketBuilder.BuildInfoPacket(info);
        var commentsPacket = HeaderPacketBuilder.BuildCommentsPacket(comments);
        var booksPacket = HeaderPacketBuilder.BuildBooksPacket(info);

        _oggStream.PacketIn(infoPacket);
        _oggStream.PacketIn(commentsPacket);
        _oggStream.PacketIn(booksPacket);

        // =========================================================
        // BODY (Audio Data)
        // =========================================================
        _processingState = ProcessingState.Create(info);

        _sampleRate = sampleRate;
        _bits = bits;
        _channels = channels;
        _sampleSize = _bits / 8;
        _sampleFrame = _sampleSize * _channels;
        _outputStream = outputStream;
    }

    public void Encode(byte[] buffer, int index, int length)
    {
        int samples = length / _sampleFrame;

        float[][] outSamples = new float[_channels][];
        for (int ch = 0; ch < _channels; ch++)
            outSamples[ch] = new float[samples];

        if (_bits == 8)
        {
            for (int sampleNumber = 0; sampleNumber < samples; sampleNumber++)
            {
                int readIndex = index + sampleNumber * _sampleFrame;
                for (int ch = 0; ch < _channels; ch++)
                {
                    readIndex += ch * _sampleSize;
                    outSamples[ch][sampleNumber] = buffer[readIndex] / 128f;
                }
            }
        }
        else
        {
            for (int sampleNumber = 0; sampleNumber < samples; sampleNumber++)
            {
                int readIndex = index + sampleNumber * _sampleFrame;
                for (int ch = 0; ch < _channels; ch++)
                {
                    readIndex += ch * _sampleSize;
                    outSamples[ch][sampleNumber] = (short)(buffer[readIndex + 1] << 8 | buffer[readIndex]) / 32768f;
                }
            }
        }

        _processingState.WriteData(outSamples, samples, 0);
        while (!_oggStream.Finished && _processingState.PacketOut(out OggPacket packet))
            _oggStream.PacketIn(packet);
    }

    public void Flush(bool force = true)
    {
        while (_oggStream.PageOut(out OggPage page, force))
        {
            _outputStream.Write(page.Header, 0, page.Header.Length);
            _outputStream.Write(page.Body, 0, page.Body.Length);
        }
    }

    public async Task FlushAsync(bool force = true, CancellationToken cancellationToken = default)
    {
        while (_oggStream.PageOut(out OggPage page, force))
        {
            await _outputStream.WriteAsync(page.Header, 0, page.Header.Length);
            await _outputStream.WriteAsync(page.Body, 0, page.Body.Length);
        }
    }
rootux commented 2 months ago

Usually in those cases - you are missing a flush somewhere probably before the end

Are you sure those missing bytes are beginning bytes and not ending bytes?

nyoro-wrl commented 2 months ago

The top shows the source file (FLAC), the middle shows the conversion by libvorbis, and the bottom shows the conversion result by .NET-Ogg-Vorbis-Encoder. Only the .NET-Ogg-Vorbis-Encoder output was shifted forward by 23 milliseconds.

image

At the end, it finishes 23 milliseconds early. I believe that data at the beginning is being lost.

image