OpusFileStream issue - Githubissues

mattgwwalker commented 4 years ago

Hi,

I wrote a small example of the use of OpusFileStream. It loads an OggOpus file using both OpusFile and OpusFileStream. It then does a sample-by-sample comparison to ensure that the two techniques produce the same output.

Unfortunately, it appears that the two techniques don't always produce the same result.

Running on the demo file examples/left-right-demo-5s.opus, everything appears to be well:

Read 240000 samples from OpusFile (in 43.1 milliseconds).
Read 240000 samples from the OpusFileStream
(in 126 reads averaging 0.41 milliseconds each).
OpusFileStream data was identical to OpusFile data.

However when I tried it on a much larger Opus file (2.2MB), the two techniques did not produce the same result:

Read 8375296 samples from OpusFile (in 1467.7 milliseconds).
OpusFileStream data was NOT identical to OpusFile data.
Read 449928 samples from the OpusFileStream
(in 234 reads averaging 0.42 milliseconds each).

I have not yet committed the example. I'll have a look into the implementation of OpusFileStream to see if I can resolve the issue. If you're interested, below is the example/test code I was using:

"""Reads an Ogg-Opus file using OpusFile and OpusFileStream.

Reads an Ogg-Opus file using OpusFileStream and compares it to the
results of reading it with OpusFile.  Gives timing information for the
two approaches.

A typical output:

Read 240000 samples from OpusFile (in 43.1 milliseconds).
Read 240000 samples from the OpusFileStream
(in 126 reads averaging 0.41 milliseconds each).
OpusFileStream data was identical to OpusFile data.

"""

import time

import numpy
import pyogg

# Specify a file to process
opus_file_filename = "/Users/matthew/Desktop/VirtualChoir/Sounds/psallite.opus"
opus_file_stream_filename = "/Users/matthew/Desktop/VirtualChoir/Sounds/psallite.opus"
#opus_file_filename = "left-right-demo-5s.opus"
#opus_file_stream_filename = "left-right-demo-5s.opus"

# Open the file using OpusFile, which reads the entire file
# immediately and places it into an internal buffer.
start_time = time.time()
opus_file = pyogg.OpusFile(opus_file_filename)
end_time = time.time()
duration = (end_time-start_time)*1000
array = opus_file.as_array()
array_index = 0
print("Read {:d} samples from OpusFile (in {:.1f} milliseconds).".format(
    len(array),
    duration
))

# Open the file using OpusFileStream, which does not read the entire
# file immediately.
stream = pyogg.OpusFileStream(opus_file_stream_filename)

# Loop through the OpusFileStream until we've read all the data
samples_read = 0
identical = True
times = []
while True:
    # Read the next part of the stream
    start_time = time.time()
    buf = stream.get_buffer_as_array()
    end_time = time.time()
    duration = (end_time-start_time)*1000
    times.append(duration)

    # Check if we've reached the end of the stream
    if buf is None:
        break

    # Increment the number of samples read
    samples_read += len(buf)

    # Check we've not read too much data from the stream
    if array_index+len(buf) > len(array):
        print("OpusFileStream data was identical to OpusFile data,\n"+
              "however there was more data in the OpusFileStream than\n"+
              "in the OpusFile.")
        identical = False
        break

    # Compare the stream with the OpusFile data.  (They should be
    # identical.)
    comparison = array[array_index:array_index+len(buf)] == buf
    if not numpy.all(comparison):
        print("OpusFileStream data was NOT identical to OpusFile data.")
        identical = False
        break

    # Move the OpusFile index along
    array_index += len(buf)

avg_time = numpy.mean(times)
print(
    ("Read {:d} samples from the OpusFileStream\n"+
     "(in {:d} reads averaging {:.2f} milliseconds each).").format(
         samples_read,
         len(times),
         avg_time
     )
)

if identical == False:
    # We've finished our work here
    pass
elif array_index == len(array):
    # We completed the comparison successfully.
    print("OpusFileStream data was identical to OpusFile data.")
else:
    # There was remaining data
    print("OpusFileStream data was identical to OpusFile data,\n"+
          "however there was more data in the OpusFile than\n"+
          "in the OpusFileStream.")

mattgwwalker commented 4 years ago

I've re-worked OpusFileStream, it now passes the above test for both the sample file and the larger 2.2MB file.

The number of reads has gone up, as previously the get_buffer() method was calling op_read() a number of times. In my modified version get_buffer() now calls op_read() once per call. This has the benefit of reducing the time per call to get_buffer().

The results on the demo file:

Read 240000 samples from OpusFile (in 48.1 milliseconds).
Read 240000 samples from the OpusFileStream
(in 252 reads averaging 0.23 milliseconds each).
OpusFileStream data was identical to OpusFile data.

The results on the larger (2.2MB) file:

Read 8375296 samples from OpusFile (in 1605.3 milliseconds).
Read 8375296 samples from the OpusFileStream
(in 8726 reads averaging 0.21 milliseconds each).
OpusFileStream data was identical to OpusFile data.

My next question, however, is how to commit all this... I've done the work on the same branch that I used for Pull Request #31. Is there an easy way to select just these changes? Or do I just commit it and make the Pull Request even larger? ;o)

Cheers,

Matthew

mattgwwalker commented 4 years ago

I discovered git's cherry-pick command. There's a first time for everything :o)

If you have any problem with the pull request please let me know.

Cheers,

Matthew

TeamPyOgg / PyOgg

OpusFileStream issue #33