beetbox / audioread

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python
MIT License
483 stars 108 forks source link

The results are different when opening the same file multiple times #52

Closed wkcn closed 7 years ago

wkcn commented 7 years ago

Hi there, I found that the results are different when I open the same file (.ogg, .3gpp) multiple times. The reason is that audioread excutes gstdec when _gst_available() is True, and the result of gstdec has some problem. I wrote a test to check it.

#coding=utf-8
import numpy as np
from audioread import gstdec
filename = "X23_01.ogg"

w = []
for _ in range(3):
    g = gstdec.GstAudioFile(filename)
    k = 0
    with g as input_file:
        for frame in g:
            w.append(frame)
            break
for i in range(20):
    print (w[0][i], w[1][i], w[2][i])

The output is

('\xfe', '\x00', '\x00')
('\xff', '\x00', '\x00')
('\xfe', '\xfe', '\xfe')
('\xff', '\xff', '\xff')
('\xfe', '\xfe', '\xff')
('\xff', '\xff', '\xff')
('\xfe', '\xfe', '\xfe')
('\xff', '\xff', '\xff')
('\xff', '\xff', '\xff')
('\xff', '\xff', '\xff')
('\xff', '\xff', '\xff')
('\xff', '\xff', '\xff')
('\x00', '\x00', '\x00')
('\x00', '\x00', '\x00')
('\x00', '\x00', '\x00')
('\x00', '\x00', '\x00')
('\x01', '\x02', '\x02')
('\x00', '\x00', '\x00')
('\x00', '\x00', '\x00')
('\x00', '\x00', '\x00')

I tried several .ogg or .3gpp files, the results are still different when opening the same file multiple times.

sampsyo commented 7 years ago

Huh! That's interesting. The first two samples differ, and then things mostly converge (but not quite). I suppose the most likely explanation is that GStreamer itself is nondeterministic. Have you checked whether the result is the same for different executions "from scratch" of the Python script?

wkcn commented 7 years ago

I make two tests.

  1. Open the same file from scratch for different executions.
  2. Make a copy and open the source file and the copy file. The test code is:
    
    #coding=utf-8
    import numpy as np
    from audioread import gstdec, ffdec
    filename = "X23_01.ogg"
    filename2 = "X23_02.ogg" # the copy of X23_01.ogg
    def get_frame(filename, frame_id):
    g = gstdec.GstAudioFile(filename)
    #g = ffdec.FFmpegAudioFile(filename)
    k = 0
    with g as input_file:
        for frame in g:
            if k == frame_id:  
                return frame
            k += 1
    return []
    w = []
    k = 100
    w.append(get_frame(filename, k))
    w.append(get_frame(filename2, k)) # the copy with different name
    w.append(get_frame(filename, k)) # the same file with the same name 
    for i in range(20):
    if True or (w[0][i] != w[1][i] or w[1][i] != w[2][i]):
        print (w[0][i], w[1][i], w[2][i])
The output is 

➜ test python3 test2.py 117 117 117 10 10 10 194 194 193 0 0 0 99 99 99 7 7 7 33 32 32 3 3 3 129 129 129 5 5 5 65 64 64 5 5 5 139 138 139 6 6 6 45 46 46 5 5 5 236 237 236 8 8 8 171 171 170 3 3 3 ➜ test python3 test2.py 117 117 118 10 10 10 194 194 194 0 0 0 99 99 99 7 7 7 33 32 32 3 3 3 129 129 130 5 5 5 65 64 64 5 5 5 139 138 139 6 6 6 45 46 46 5 5 5 236 237 236 8 8 8 171 171 171 3 3 3

The error is within 1.
However, the results are the same when using FFmpeg to decode the sample.

ffmpeg

➜ test python3 test2.py 108 108 108 2 2 2 52 52 52 244 244 244 183 183 183 2 2 2 168 168 168 243 243 243 221 221 221 2 2 2 253 253 253 242 242 242 85 85 85 3 3 3 105 105 105 243 243 243 9 9 9 3 3 3 211 211 211 244 244 244



The Uncertainty of GStreamer has an impact on the audio analysis of librosa library.
librosa calls audioread to open *.ogg or *.3gpp files, and audioread may use GStreamer to decode the sample.
I use cqt(Constant-Q transform) and dtw(Dynamic Time Warping) on the same sample, the optimal match between two given sequences is not a straight line.
sampsyo commented 7 years ago

Thanks for investigating! It still seems like we can chalk this up to GStreamer itself. It would be worthwhile to investigate whether we can blame that or if the problem is coming from our library.

If you need to avoid the nondeterminism, I suggest forcing the ffmpeg backend (or some other suitable backend) by using the underlying class directly.

wkcn commented 7 years ago

Thank you :-)