Closed geraldstanje closed 8 years ago
I suspect this is due to the boundary condition. 'zpd'
specifies you want to treat the array as though it was zero-padded, while according to the wavelet package reference R only supports periodic and reflection boundary conditions. You can find more info on boundary conditions supported by pywt here.
@kwohlfahrt is seems that the mode does not effect the result in Python. R seems to use: Boundary Method: Periodic. I extract the scaling coefficients in R. Is it the right way I extract them in Python as well? any idea what goes wrong by looking at the dwt object output in R?
Python output:
result with mode='symmetric':
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236 0.09582752
-1.69595575 -2.26903105]]
result with mode='periodic':
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236 0.09582752
-1.69595575 -2.26903105]]
result with mode='periodization':
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236 0.09582752
-1.69595575 -2.26903105]]
result with mode='smooth':
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236 0.09582752
-1.69595575 -2.26903105]]
result with mode='constant':
[[-1.17332339 -1.65193486 -2.39370823 -2.61983228 -0.85778236 0.09582752
-1.69595575 -2.26903105]]
some more infos about the dwt object in R (print(Wave_results)):
$ Rscript main.R
Loading required package: methods
DWT Wavelet Coefficients:
Level 1
Series 1: 1.0458e-02 -2.0916e-02 1.0458e-02 ... 0.0000e+00 2.0916e-02 1.0458e-02
Level 2
Series 1: 7.3950e-03 -4.4370e-02 -1.0353e-01 ... -6.8773e-01 5.1765e-01 2.2185e-02
Level 3
Series 1: -1.1504e-01 3.1374e-01 4.0787e-01 ... -7.0069e-01 -6.2749e-02 4.2355e-01
Level 4
Series 1: 1.6269e-01 -3.5866e-01 -1.4938e+00 ... 1.6935e+00 -1.6417e+00 1.1203e+00
Level 5
Series 1: -2.6119e+00 2.6694e+00 -1.4275e+00 ... -2.3322e+00 2.6537e+00 -2.3975e+00
Level 6
Series 1: -1.1000e+00 -1.5973e+00 -1.9319e+00 -2.0003e+00 3.6975e-03
Level 7
Series 1: 2.3661e-01 2.0655e-01
Level 8
Series 1: -5.3891e-01
DWT Scaling Coefficients:
Level 1
Series 1: -1.1733e+00 -1.1629e+00 -1.1942e+00 ... -2.8438e-01 -2.4255e-01 -2.1118e-01
Level 2
Series 1: -1.6519e+00 -1.7333e+00 -1.8960e+00 ... -1.7850e+00 -9.1983e-01 -3.2084e-01
Level 3
Series 1: -2.5663e+00 3.7246e-01 6.0254e-01 ... -4.3281e-01 -2.4617e+00 -8.7728e-01
Level 4
Series 1: 6.8943e-01 3.2444e+00 -4.4940e-01 ... 1.2071e+00 1.0296e+00 -2.3610e+00
Level 5
Series 1: 1.9764e+00 -1.1088e+00 -2.6644e+00 ... -2.5651e+00 -9.4668e-01 -9.4145e-01
Level 6
Series 1: -2.6681e+00 -1.1096e+00 -7.7493e-01 -1.6272e+00 -1.3351e+00
Level 7
Series 1: -1.3325e+00 -2.0947e+00
Level 8
Series 1: -2.4234e+00
Length of Original Series: 380
Wavelet Coefficients Aligned? FALSE
Center of Energy Method Used? FALSE
Boundary Method: Periodic
Number of Boundaries Coefficients per Level:
Level 1: 0
Level 2: 0
Level 3: 0
Level 4: 0
Level 5: 0
Level 6: 0
Level 7: 0
Level 8: 0
Filter Class: Daubechies
Filter Name: HAAR
Filter Length: 2
@kwohlfahrt did you already have time to look at it?
No, sorry. I'm very busy writing up at the moment, so can't commit time to look into this deeply in the immediate future.
anybody who could give feedback?
Hi @geraldstanje,
I am not sure exactly what your goal is here with keeping the first coefficient of the approximation coefficient array at each level, but the following code does what you want. It is not computationally efficient, but depending on your data size that may or may not matter. It uses wavedec
to do a 1-level decomposition, restarts with a 2-level decomposition, etc...
If you look at the internals of pywt.wavedec
you could modify it to avoid the redundant computations. This does match the output from R above as is, though
acoeffs = [pywt.wavedec(df['2'], 'haar', 'periodic', level=lev)[0][0] for lev in range(1, 9)]
print(acoeffs)
[-1.1733234, -1.6519349, -2.3937082, -2.6198323, -0.85778236, 0.09582752, -1.6959558, -2.269031]
edit: changed variable name from l
to lev
to avoid confusion with 1
As an aside, it is good to see that the pywt
functions work properly for the pandas.series.Series
inputs used in this example. That is not something we explicitly test against as far as I can recall, but the call to np.asarray
seems to take care of the conversion.
@grlee77 my goal is to port the R code of the wavelet transform to python in order to extract the 8 DWT Scaling Coefficients called V in R. See line 8 - 22 in the following link: https://gist.github.com/geraldstanje/a0ad53b7cb7828f1999ee9693df3a75e
Im not sure if my Python code does that in the right way?
Do you see that the Python output and the R output differs if you compare those 8 values? Python output:
[-1.1733234, -1.6519349, -2.3937082, -2.6198323, -0.85778236, 0.09582752, -1.6959558, -2.269031]
R output:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] -1.173323 -1.651935 -2.566267 0.68943 1.976365 -2.668054 -1.332537 -2.423402
edit: I recognized that the pywt.wavedec returns the same result for different levels (i set lev to 2, 4, 6, 8). Is that expected?
@grlee77 do you see the diff?
sorry, I had not noticed only the first couple of values were the same.
At level 1 the input is size 380. At level 2 it is 190. Both of these are even length and the results match. At level 3, the input is size 95 which is odd and this is where the differences start.
This appears to be a difference in how R is handling the convolution for odd length inputs. The implementation in PyWavelets is tested vs. outputs from Matlab's DWT and is not guaranteed to exactly match the implementation in R.
If you do the transform as follows, you can reproduce the numbers from R. This requires manual periodic padding up to even size followed by removal of the padded element whenever the input array has odd length:
acoeffs = []
a = np.asarray(df['2']) # the data from your example
for i in range(1, 9):
if len(a) % 2:
a = np.concatenate(([a[-1]], a)) # prepend periodic sample to give an even length
a, d = pywt.dwt(a, 'haar', 'periodic', 0)
a = a[1:] # remove the first sample of the output
d = d[1:]
else:
a, d = pywt.dwt(a, 'haar', 'periodic', 0)
acoeffs.append(a[0])
print(acoeffs)
To figure out why this is necessary would require comparing the source code in R to see exactly what the padding conventions are. If you are so inclined, the relevant convolution routine used in PyWavelets is here: https://github.com/PyWavelets/pywt/blob/5fc0f769dfdd6d21c2d0b2fa4fa8cdb7e887bb9f/pywt/_extensions/c/convolution.template.c#L107
as this does not appear to be a bug in PyWavelets
I am going to close the issue.
@grlee77 thanks for the info.
i wonder how the following version can be improved?
def next_pow(x):
return int(2**math.ceil(math.log(x, 2)))
def wavedec2(data):
acoeffs = []
a = np.asarray(data)
pow_two = next_pow(len(a))
diff = pow_two - len(a)
if diff > 0:
a = np.concatenate((a, [0] * diff))
for i in range(1, 5):
if len(a) % 2:
a = np.concatenate(([a[-1]], a)) # prepend periodic sample to give an even length
a, d = pywt.dwt(a, 'haar', 'periodic', 0)
a = a[1:] # remove the first sample of the output
d = d[1:]
else:
a, d = pywt.dwt(a, 'haar', 'periodic', 0)
acoeffs.append(a[0])
return acoeffs
"a = a[1:] # remove the first sample of the output" Why this is necessary?
Hi,
I want to port some R code which includes a wavelet transform to Python:
r code: https://gist.github.com/geraldstanje/a0ad53b7cb7828f1999ee9693df3a75e
python code: https://gist.github.com/geraldstanje/7bcd560af208a86352adfe4ed5a38202
can you explain why the python version returns a different result?
here is the input file: https://gist.github.com/geraldstanje/52a4fb141e124da3da2249654eb6b91d
info regarding the dwt function in R: Slot "V": A list with element i comprised of a matrix containing the ith level scaling coefficients