PyWavelets / pywt

PyWavelets - Wavelet Transforms in Python
http://pywavelets.readthedocs.org
MIT License
2.06k stars 482 forks source link

wavelet test - python result differs from R #186

Closed geraldstanje closed 8 years ago

geraldstanje commented 8 years ago

Hi,

I want to port some R code which includes a wavelet transform to Python:

r code: https://gist.github.com/geraldstanje/a0ad53b7cb7828f1999ee9693df3a75e

python code: https://gist.github.com/geraldstanje/7bcd560af208a86352adfe4ed5a38202

can you explain why the python version returns a different result?

here is the input file: https://gist.github.com/geraldstanje/52a4fb141e124da3da2249654eb6b91d

$ python main.py
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236  0.09582752
  -1.69595575 -2.26903105]]
$ Rscript main.R
Loading required package: methods
          [,1]      [,2]      [,3]    [,4]     [,5]      [,6]      [,7]
[1,] -1.173323 -1.651935 -2.566267 0.68943 1.976365 -2.668054 -1.332537
          [,8]
[1,] -2.423402

info regarding the dwt function in R: Slot "V": A list with element i comprised of a matrix containing the ith level scaling coefficients

kwohlfahrt commented 8 years ago

I suspect this is due to the boundary condition. 'zpd' specifies you want to treat the array as though it was zero-padded, while according to the wavelet package reference R only supports periodic and reflection boundary conditions. You can find more info on boundary conditions supported by pywt here.

geraldstanje commented 8 years ago

@kwohlfahrt is seems that the mode does not effect the result in Python. R seems to use: Boundary Method: Periodic. I extract the scaling coefficients in R. Is it the right way I extract them in Python as well? any idea what goes wrong by looking at the dwt object output in R?

Python output:

result with mode='symmetric':
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236  0.09582752
  -1.69595575 -2.26903105]]

result with mode='periodic':
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236  0.09582752
  -1.69595575 -2.26903105]]

result with mode='periodization':
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236  0.09582752
  -1.69595575 -2.26903105]]

result with mode='smooth':
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236  0.09582752
  -1.69595575 -2.26903105]]

result with mode='constant':
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236  0.09582752
  -1.69595575 -2.26903105]]

some more infos about the dwt object in R (print(Wave_results)):

$ Rscript main.R
Loading required package: methods
DWT Wavelet Coefficients:
Level 1
Series 1: 1.0458e-02 -2.0916e-02 1.0458e-02 ... 0.0000e+00 2.0916e-02 1.0458e-02
Level 2
Series 1: 7.3950e-03 -4.4370e-02 -1.0353e-01 ... -6.8773e-01 5.1765e-01 2.2185e-02
Level 3
Series 1: -1.1504e-01 3.1374e-01 4.0787e-01 ... -7.0069e-01 -6.2749e-02 4.2355e-01
Level 4
Series 1: 1.6269e-01 -3.5866e-01 -1.4938e+00 ... 1.6935e+00 -1.6417e+00 1.1203e+00
Level 5
Series 1: -2.6119e+00 2.6694e+00 -1.4275e+00 ... -2.3322e+00 2.6537e+00 -2.3975e+00
Level 6
Series 1: -1.1000e+00 -1.5973e+00 -1.9319e+00 -2.0003e+00 3.6975e-03
Level 7
Series 1: 2.3661e-01 2.0655e-01
Level 8
Series 1: -5.3891e-01

DWT Scaling Coefficients:
Level 1
Series 1: -1.1733e+00 -1.1629e+00 -1.1942e+00 ... -2.8438e-01 -2.4255e-01 -2.1118e-01
Level 2
Series 1: -1.6519e+00 -1.7333e+00 -1.8960e+00 ... -1.7850e+00 -9.1983e-01 -3.2084e-01
Level 3
Series 1: -2.5663e+00 3.7246e-01 6.0254e-01 ... -4.3281e-01 -2.4617e+00 -8.7728e-01
Level 4
Series 1: 6.8943e-01 3.2444e+00 -4.4940e-01 ... 1.2071e+00 1.0296e+00 -2.3610e+00
Level 5
Series 1: 1.9764e+00 -1.1088e+00 -2.6644e+00 ... -2.5651e+00 -9.4668e-01 -9.4145e-01
Level 6
Series 1: -2.6681e+00 -1.1096e+00 -7.7493e-01 -1.6272e+00 -1.3351e+00
Level 7
Series 1: -1.3325e+00 -2.0947e+00
Level 8
Series 1: -2.4234e+00

Length of Original Series: 380
Wavelet Coefficients Aligned? FALSE
Center of Energy Method Used? FALSE

Boundary Method: Periodic
Number of Boundaries Coefficients per Level:
Level 1: 0
Level 2: 0
Level 3: 0
Level 4: 0
Level 5: 0
Level 6: 0
Level 7: 0
Level 8: 0

Filter Class: Daubechies
Filter Name: HAAR
Filter Length: 2
geraldstanje commented 8 years ago

@kwohlfahrt did you already have time to look at it?

kwohlfahrt commented 8 years ago

No, sorry. I'm very busy writing up at the moment, so can't commit time to look into this deeply in the immediate future.

geraldstanje commented 8 years ago

anybody who could give feedback?

grlee77 commented 8 years ago

Hi @geraldstanje,

I am not sure exactly what your goal is here with keeping the first coefficient of the approximation coefficient array at each level, but the following code does what you want. It is not computationally efficient, but depending on your data size that may or may not matter. It uses wavedec to do a 1-level decomposition, restarts with a 2-level decomposition, etc...

If you look at the internals of pywt.wavedec you could modify it to avoid the redundant computations. This does match the output from R above as is, though

acoeffs = [pywt.wavedec(df['2'], 'haar', 'periodic', level=lev)[0][0] for lev in range(1, 9)]
print(acoeffs)
 [-1.1733234, -1.6519349, -2.3937082, -2.6198323, -0.85778236, 0.09582752, -1.6959558, -2.269031]

edit: changed variable name from l to lev to avoid confusion with 1

grlee77 commented 8 years ago

As an aside, it is good to see that the pywt functions work properly for the pandas.series.Series inputs used in this example. That is not something we explicitly test against as far as I can recall, but the call to np.asarray seems to take care of the conversion.

geraldstanje commented 8 years ago

@grlee77 my goal is to port the R code of the wavelet transform to python in order to extract the 8 DWT Scaling Coefficients called V in R. See line 8 - 22 in the following link: https://gist.github.com/geraldstanje/a0ad53b7cb7828f1999ee9693df3a75e

Im not sure if my Python code does that in the right way?

Do you see that the Python output and the R output differs if you compare those 8 values? Python output:

[-1.1733234, -1.6519349, -2.3937082, -2.6198323, -0.85778236, 0.09582752, -1.6959558, -2.269031]

R output:

          [,1]      [,2]      [,3]    [,4]     [,5]      [,6]      [,7]      [,8] 
[1,] -1.173323 -1.651935 -2.566267 0.68943 1.976365 -2.668054 -1.332537 -2.423402

edit: I recognized that the pywt.wavedec returns the same result for different levels (i set lev to 2, 4, 6, 8). Is that expected?

geraldstanje commented 8 years ago

@grlee77 do you see the diff?

grlee77 commented 8 years ago

sorry, I had not noticed only the first couple of values were the same.

At level 1 the input is size 380. At level 2 it is 190. Both of these are even length and the results match. At level 3, the input is size 95 which is odd and this is where the differences start.

This appears to be a difference in how R is handling the convolution for odd length inputs. The implementation in PyWavelets is tested vs. outputs from Matlab's DWT and is not guaranteed to exactly match the implementation in R.

If you do the transform as follows, you can reproduce the numbers from R. This requires manual periodic padding up to even size followed by removal of the padded element whenever the input array has odd length:

acoeffs = []
a = np.asarray(df['2'])  # the data from your example
for i in range(1, 9):
    if len(a) % 2:
        a = np.concatenate(([a[-1]], a))  # prepend periodic sample to give an even length
        a, d = pywt.dwt(a, 'haar', 'periodic', 0)
        a = a[1:]  # remove the first sample of the output
        d = d[1:]
    else:
        a, d = pywt.dwt(a, 'haar', 'periodic', 0)
    acoeffs.append(a[0])
print(acoeffs)

To figure out why this is necessary would require comparing the source code in R to see exactly what the padding conventions are. If you are so inclined, the relevant convolution routine used in PyWavelets is here: https://github.com/PyWavelets/pywt/blob/5fc0f769dfdd6d21c2d0b2fa4fa8cdb7e887bb9f/pywt/_extensions/c/convolution.template.c#L107

grlee77 commented 8 years ago

as this does not appear to be a bug in PyWavelets I am going to close the issue.

Arnold1 commented 8 years ago

@grlee77 thanks for the info.

i wonder how the following version can be improved?

def next_pow(x):
  return int(2**math.ceil(math.log(x, 2)))

def wavedec2(data):
  acoeffs = []
  a = np.asarray(data)

  pow_two = next_pow(len(a))

  diff = pow_two - len(a)
  if diff > 0:
    a = np.concatenate((a, [0] * diff))

  for i in range(1, 5):
    if len(a) % 2:
      a = np.concatenate(([a[-1]], a))  # prepend periodic sample to give an even length
      a, d = pywt.dwt(a, 'haar', 'periodic', 0)
      a = a[1:]  # remove the first sample of the output
      d = d[1:]
    else:
      a, d = pywt.dwt(a, 'haar', 'periodic', 0)

    acoeffs.append(a[0])

  return acoeffs
guigaoliveira commented 6 years ago

"a = a[1:] # remove the first sample of the output" Why this is necessary?