Using different sampling frequencies

vbschettino commented 4 years ago

Hi, and thanks for making this available.

I'm not sure if this is more of a feature request or just a question/lack of understanding on how to use the script from my part. I need to generate pink noise but with a sampling frequency of 20Hz instead of 1. If I just generate a big signal and sample it at different frequencies, the behaviour I get is quite different.

To illustrate, see the two attached figures, where I used the same random seed and generated both pink and brown noise for 30s, but sampled at 1Hz vs 100Hz. I would have expected a similar version of the signals in both cases, just better "interpolated". Instead, I see much more variance in the 100Hz case.

To fix(?) this, I thought it would be enough to simply adapt the frequency generation in line 68 of the script, as such f = rfftfreq(samples, 1/fs), but this seemed to make no difference. Could you point me on how to get the expected behaviour? Happy to do changes and submit a pull request if appropriate.

Figure_1Hz Figure_100Hz

felixpatzelt commented 4 years ago

Everything is relative to samples in unit time. Frequency therefore has the units of sample^-1, not second^-1. If you assume the length of one sample, then you only need to keep that in mind and scale the output to the desired variance. No changes in the package required.

If you want to generate the same time-series at different sampling frequencies, you can generate it at the highest required frequency and then downsample the other versions.

I will think about how some convenience functionality could be added. Just changing rfftfreq doesn't do the trick because the output signal is currently always scaled to unit variance by normalising the power spectrum.

vbschettino commented 4 years ago

If you assume the length of one sample, then you only need to keep that in mind and scale the output to the desired variance.

Thanks, but I do not follow. If I apply a scaling factor to the output series wouldn't it just change the variance of the whole signal? Instead, I would like to keep unit variance, but make the time correlation between samples insensitive to the sampling frequency.

Could you elaborate a bit more? Perhaps give a small example of how you would do this?

If you want to generate the same time-series at different sampling frequencies, you can generate it at the highest required frequency and then downsample the other versions.

felixpatzelt commented 4 years ago

I'll try to elaborate with example code.

You'll need to make these changes to your local development copy of colorednoise.py for one of the time series generated below (y3):

-def powerlaw_psd_gaussian(exponent, size, fmin=0):
+def powerlaw_psd_gaussian(exponent, size, fmin=0, d=1):

-    f = rfftfreq(samples)
+    f = rfftfreq(samples, d)

Let's see what happens in the time- and frequency domains when we change the sampling frequency using different methods.

from matplotlib import mlab
from matplotlib import pylab as plt
import numpy as np
import colorednoise as cn

exponent        = .5 
samples         = 2**18
downsample_step = 2**3 # downsample to multiple of default sample length
fmin            = .5/100 # low frequency cutoff for power scaling

# generate some noise using default settings
np.random.seed(1)
y1 = cn.powerlaw_psd_gaussian(exponent, samples, fmin=fmin)

# downsample while using the mean as a primitive lowpass filter
t1 = np.arange(len(y1))
t2 = t1[::downsample_step]
y2 = y1.reshape((len(y1)//downsample_step, downsample_step)).mean(axis=1)

# note: the line below (commented out) would NOT give the downsampled result because of aliasing.
# energy in high frequencies above the new Nyquist freq. would be folded back
# and change the power spectrum
#y2 = y1[::downsample_step]

# generate with lower sampling rate (requires changes to colorednoise.py)
np.random.seed(1)
y3 = cn.powerlaw_psd_gaussian(exponent, samples, fmin=fmin, d=downsample_step)

# generate and reinterpret default sampling rate to be effectively lower
np.random.seed(1)
y4 = cn.powerlaw_psd_gaussian(exponent, samples, fmin=fmin*downsample_step, d=1)

The sample variance of y1 and y3 is aprroximately one, for y2 it is reduced

print(np.var(y1), np.var(y2), np.var(y3))

0.9980746219234694 0.29941670978221036 0.9984411956946726

Plotting the beginning of the time series shows more clearly what happens:

plot_t = 2**9 # samples to show from t=0 on
plt.close(1)
plt.figure(1)

plt.plot(t1[:plot_t], y1[:plot_t], '#ccccee', label='y1')
plt.plot(t2[:plot_t//downsample_step], y2[:plot_t//downsample_step], 'C1', label='y2')
plt.plot(t2[:plot_t//downsample_step], y3[:plot_t//downsample_step], 'C2', label='y3')
plt.plot(t2[:plot_t//downsample_step], y4[:plot_t//downsample_step], 'C3--', label='y4')
plt.legend(ncol=4)

y2 is a smoothed version of y1 that tracks slow movements but not the fastest ones. The maximum amplitudes are smaller. y3 and y4 are identical. They look similar to y2, but the largest amplitudes are bigger, comparable to y1. Because we didn't use exactly the same random amplitudes and phases, y2and y3/ y4 are not identical sample by sample.

Figure 1-2

The power spectrum for the downsampled series y2 has the same low-frequency amplitudes as the original one y1 - and some small high-frequency artefacts from the imperfect lowpass we applied in the time domain. y3, which was generated with a lower sampling frequency and unit variance, has a higher energy at each frequency. This compensates for the missing high frequency energy and shifts the PSD upwards. y4 is identical to y3, highlighting that the code change we did for testing is not required to generate these time series.

plt.close(2)
plt.figure(2)

NFFT = 2**13 # segment length for Welch's method

s1, f1 = mlab.psd(y1, Fs=2,  NFFT=NFFT)
s2, f2 = mlab.psd(y2, Fs=2/downsample_step, NFFT=NFFT//downsample_step)
s3, f3 = mlab.psd(y3, Fs=2/downsample_step, NFFT=NFFT//downsample_step)
s4, f4 = mlab.psd(y4, Fs=2/downsample_step, NFFT=NFFT//downsample_step)

plt.loglog(f1, s1, label='S(y1)')
plt.loglog(f2, s2, '--', label='S(y2)')
plt.loglog(f3, s3, label='S(y3)')
plt.loglog(f4, s4, '--', label='S(y4)')

plt.grid(True)
plt.legend(ncol=4)

felixpatzelt commented 4 years ago

Another experiment: we could write some code such that changing sampling rate and frequency together in the right way gives us exactly the downsampled time series.

T      = 2**18
dt     = 2
np.random.seed(1)
t      = np.arange(T)      
sr     = np.random.normal(size=T//2+1)
si     = np.random.normal(size=T//2+1)
si[0]  = 0
s = sr + 1J*si
x1     = np.fft.irfft(s) * np.sqrt(T/2)
x2     = np.fft.irfft(s[:T//dt//2+1]) * np.sqrt(T/2/dt)

print(len(t), len(x1), len(x2), len(t[::dt]))
print(np.var(x1), np.var(x2))

(262144, 262144, 131072, 131072) (0.999047664194959, 0.9975979364549064)

plt.close(3)
plt.figure(3)
plot_t = 2**6
plt.plot(t[:plot_t], x1[:plot_t])
plt.plot(t[:plot_t:dt], x2[:plot_t//dt])

x1 has a lower sampling rate but tracks x2 as far as possible. Both series have unit variance.

Analogous to the earlier examples y1-y4, we observe that a lower-frequency signal with unit variance has its psd shifted upwards.

plt.close(4)
plt.figure(4)

NFFT = 2**10

s1, f1 = mlab.psd(x1, Fs=2,  NFFT=NFFT)
s2, f2 = mlab.psd(x2, Fs=2/dt, NFFT=NFFT//dt)

plt.loglog(f1, s1)
plt.loglog(f2, s2)

plt.grid(True)

In conclusion, we see that changing the sampling rate either changes either the variance of the signal or moves the psd. This is the consequence of Parseval's Theorem:

Summation or integration of the spectral components yields the total power (for a physical process) or variance (in a statistical process), identical to what would be obtained by integrating x^2 over the time domain Wikipedia

vbschettino commented 4 years ago

Thank you very much for such a detailed answer, it's much clearer now. I just have two more questions:

For y1, why did you have to change fmin from its default and how did you pick the value? Is it related to the exponent used?
For my use case, pink noise at 20 Hz, I understood that I should normally generate the signal (as y4) with the appropriate sample_step (0.05) and then apply a scaling factor so that the PSD matches the original one. Is that correct?

felixpatzelt commented 4 years ago

For the examples, I wanted to have a cutoff that is clearly visible in the power spectrum for different sample rates, so I moved fmin to a sufficiently high frequency. You can omit fmin if you want to have the same power-law over all frequencies that exist in your time series. For convenience, setting fmin to zero actually sets it to the lowest possible value, which is 1 / samples. This is also the default behaviour.
If you are not interested in fmin, you can just do cn.powerlaw_psd_gaussian(exponent, samples) with the unchanged package and then multiply the result with a factor to get the desired variance or PSD, whichever you need for your use case. I don't know, what you mean by "original one". You do need to use the appropriate sample_step to have correctly labeled axes when you plot the time-series or PSD, and possibly to calculate your scaling factor.

vbschettino commented 4 years ago

Perfect, thank you.

felixpatzelt / colorednoise

Using different sampling frequencies #6