Closed tvdbogert closed 9 years ago
I just sent you trial 079.
Currently the spectra don't really give a "nice" picture of the frequency content. I don't think smoothing them is a bad idea. I don't think it would be invalidating what we want to show.
Using a shorter duration of the signal would probably be fine.
Thanks for sending the file. I will try a few things on this.
On 12/14/2014 7:40 PM, Jason K. Moore wrote:
I just sent you trial 079.
Currently the spectra don't really give a "nice" picture of the frequency content. I don't think smoothing them is a bad idea. I don't think it would be invalidating what we want to show.
Using a shorter duration of the signal would probably be fine.
— Reply to this email directly or view it on GitHub https://github.com/csu-hmc/perturbed-data-paper/issues/77#issuecomment-66937940.
This is not as easy as I thought. It's not completely clear to me yet, but here's what I found so far.
Here is a plot and the code that produced it (this does not show that I selected only the 8 minutes of perturbed walking).
% resample to 1000 Hz
Fs = 1000;
T = 1/Fs;
output_slow = interp1(output_slow_time, output_slow, output_slow_time(1):T:output_slow_time(end));
%Lengths
L_output_slow=length(output_slow);
%NFFT
NFFT_output_slow = (L_output_slow);
%FI
fi_output_slow = Fs/2*linspace(0,1,NFFT_output_slow/2+1);
%Y
Y_output_slow = abs(fft(output_slow,NFFT_output_slow))/L_output_slow;
% smooth the spectrum by doing a moving average
window = 0.1; % smoothing window in Hz
Nwindow = round(window/(Fs/NFFT_output_slow))
smY_output_slow = conv(Y_output_slow,ones(Nwindow,1)/Nwindow,'same');
figure(2)
plot(fi_output_slow,2*Y_output_slow(1:NFFT_output_slow/2+1),'Color',[0.2157 0.4706 0.7490])
hold on
plot(fi_output_slow,2*smY_output_slow(1:NFFT_output_slow/2+1),'r')
legend('Output, not smoothed','smoothed')
title('Slow Walking (0.8 m/s)','Fontweight','bold')
ylabel('Amplitude','Fontweight','bold')
ylim([0 0.003]); xlim([0 8])
I'm downloading the data now (should have taken you up on the USB offer). Couple of comments on this because I noticed the spectrum in the paper too (all of this you may know or may know in more detail, but I thought I would add my 2 cents):
1) "Standard" practice, as known to me, is to average the individual shorter time spectra together (referred to as spectral averaging -> think ensemble averaging). The duration of each average is referred to as the temporal resolution (frequency resolution is then the df inside each fft or PSD). You can also min/max envelope to show variation. I don't think I have seen smoothing done by moving average across the spectrum itself. Spectral averaging is good for beating down transient noise as it gets "averaged out". But for the same reason spectral averaging should only be used on data that is fully periodic data and is "stationary" (non-changing mean).
2) I have not looked through the code yet, but a guess is that you are not demeaning the data and/or not hanning/hamming windowing? The 0Hz spike seems to indicate that.
3) For amplitude - Why not scale the fft to get to PSD units?
4) We almost never worry about power of 2 nfft any more. Power of 2 Nfft was dictated by speed not accuracy. Faster algorithms and computers have alleviated those limitations. No need to worry about zero padding or clipping data lengths to size anymore.
5) If you do need to use the full time history in one spectrum, there are a couple techniques such as the cumulative power density that can give you a nice smooth plot of RMS vs. Frequency.
I've got a bunch of code that does all of this and will run the data through it. If you want to make a bunch of PSDs to average together quickly, take a look at spectrogram.m and pwelch.m.
@humphreysb Thanks for these tips. I didn't know much of it. How do you do #1? Do you have code or a reference that shows this?
Here is a short script in my issue-77 branch that has the hamming window and Ton's moving average:
https://github.com/csu-hmc/perturbed-data-paper/blob/issue-77/src/treadmill_motion.py
It produces these two figures:
The first shows what the moving average of 20 points does on the un-windowed FFT. The second shows the difference in hamming windowing and not (moving average applied). The hamming window doesn't seem to clean up the spectrum much. This is only from the 8 min perturbation data sampled at 100 hz. This is trial 20, I didn't do 79 (like Ton did) for some reason.
@humphreysb
3) For amplitude - Why not scale the fft to get to PSD units?
I think we'd like amplitude in the units of the signal. What are PSD units and why are they useful?
4) We almost never worry about power of 2 nfft any more. Power of 2 Nfft was dictated by speed not accuracy. Faster algorithms and computers have alleviated those limitations. No need to worry about zero padding or clipping data lengths to size anymore.
Good to know. All examples I've ever come across have all this. Do you have a nice example of doing this without all that?
5) If you do need to use the full time history in one spectrum, there are a couple techniques such as the cumulative power density that can give you a nice smooth plot of RMS vs. Frequency.
Example?
I've got a bunch of code that does all of this and will run the data through it. If you want to make a bunch of PSDs to average together quickly, take a look at spectrogram.m and pwelch.m.
Where are these files?
This may have turned into so much more then you want to know.....
I have MATLAB code, but no Python. In MATLAB spectrogram.m (in the Signal Processing Toolbox) can make the series of fft (actually PSDs) that you can then average across. I think there should also be a function in base MATLAB still around called specgram.m. Pwelch and specgram return pretty good results, we tweeked them for scaling issues we found and wrapped a bunch of analysis around them.
So the window does 2 things all related to the "start-up" and "stop" transients at the beginning and end of the signal: 1) If the first sample (or the last) is a non-zero value, due to where you "grabbed" the data, it has an artificial offset and so too will the spectrum. 2) The offset also "rings" the spectrum like an impulse transient. In an unaveraged spectrum, it will show up as high frequency ringing. Keep in mind, both of these are a function of which sample you chose to start with in any fft. Windows such as Hanning or Hamming, "pinch" the ends of the signal to zero. Ahhh consistency! There are then scaling factors that you multiply the FFT to adjust for the pinching.
Power 2 was needed when FFT algorithms used a scheme called radix-2. And even then it was only really a speed consideration. Here is a good review of the algorithms but I might also be able to find something in a text book that you can reference: http://www.mathworks.com/company/newsletters/articles/faster-finite-fourier-transforms-matlab.html Functions such as spegram where speed is paramount and signal length (zero-padding) is not a concern still use power2 NFFT. (You can just not use the last PSD returned from specgram when you spectral average).
FFT units in the raw are notoriously mis-scaled. Often times, they don't take into account 1/Nfft or 1/fs that is embedded in the fft algorithm. We almost never report them. While the units look convenient they're just problematic. You are interested in where the power resides in the spectrum. Some people say "who cares about the units". But then why not just do it correct and use PSD units? If you want to have some fun... run a signal with a couple of pure sinusoids and some random Gaussian noise through a FFT algorithm with raw units output; it probably won't match your expectations. That's why you mostly see PSD units used when there is random content.
Units: FFT: vel (vel=m/s) PSD: vel_rms^2/Hz <---- Not intuitive units, but that's power and tells you where the signal's power resides.
So what if you want more intuitive units: If you take the square root of the area under a section of the PSD, you get units: vel_rms (this is called Parseval's theorm). These are signals with random components, so RMS values are correct. What if you start at 0Hz and open up more and more of that area? You get what is called Cumulative Power Spectral Density.
I'll look at the data from the report. For illustration, I grabbed a quick example from data I took tap testing some hardware to determine it's first flexible body mode (this breaks some of my own comments above as I was measuring a transient). See how you get nice vertical lines in the cumulative accel plot where there is energy? (It the signal was periodic and not a transient, the value in the octave band plot at the bottom would exactly match the RMS value of the time history, but alas I am using a transient).
I can see if I can get together some MATLAB code that does this and get it to you. Is MATLAB OK? Do you have signal processing toolbox? I may punt though on the rest of the review to get it done.
I'll also see if I can get the in-house code that we developed here to you. We (another engineer and I) rolled our own functions and validated them. Years of work in that. I have always wanted to check into whether I could release into the public domain, but have never done it. I might be able to.
Thanks Brad that is helpful and not more than I wanted to know. I always like knowing more :)
But, I'm not sure if we are trying to show what a periodogram shows. We simply want to give an idea of the frequency content of the signal and how far away from white noise it is.
And yes if you have any Matlab code that is fine. But before you spend any time writing the code to process this data, some clarity is needed in what we are trying to show.
BTW, this is Python's equivalent to spectogram: http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.signal.periodogram.html
Can you clarify the fundamental differences in a periodigram/spectorgram and the Fourier Transform plot?
Also to note, we are not worried about speed of calculation here. All we are doing is making this one figure.
Here is a suggestion on smoothing:
The variance problem can be reduced by smoothing the periodogram. Various techniques to reduce spectral bias and variance are the subject of spectral estimation.
One such technique to solve the variance problems is also known as the method of averaged periodograms[8] or as Bartlett's method. The idea behind it is, to divide the set of N samples into L sets of M samples, compute the discrete Fourier transform (DFT) of each set, square it to get the power spectral density and compute the average of all of them. This leads to a decrease in the standard deviation as \frac{1}{\sqrt{L}}.
Doesn't Matlab have a tool that does the periodogram this way? That would be best but we may lose the Octave compatibility.
This suggestion was also in the article I linked to initially. But, as I said before, I am very confident that this is mathematically equivalent to doing what I did: take the noisy, high-resolution spectrum on the entire data record and then do a moving average to reduce the resolution in the frequency domain.
I don't think we need to worry about the vertical axis, but PSD in proper units would be great if we can. We would need to take the square of the abs(fft) I think, and then divide by NFFT. Or something like that.
I made up some example code:
fs=1000; %Sample Rate (sa/sec)
t=0:1/fs:10; %Time Vector
%Make 3 sine waves
y1=1*sin(10.*t.*2.*pi + pi*rand(1)); %10Hz with random phase
y2=1*sin(50*t*2*pi + pi*rand(1)); % 50Hz with random phase
y3=1*sin(100*t*2*pi + pi*rand(1)); %100Hz with random phase
r=rand(1,length(t))*0.1; %Random Noise (set it to 0 to see no noise)
x=y1+y2+y3+r; %Add the three sinusoids together.
%To check scaling, just use one at a time. More than one
%Sinusoid may cause con/de-structive interference. I think this is also where we found
%some scaling issues with MATLAB's algorithm.
x=x-mean(x); %If you comment out this line, you will get the spike at 0Hz
nfft=1000; %Each FFT will have a second of data in it
w=nfft; %The window to use; this defaults to using a hamming window
noverlap=nfft/2; %Overlap the fft by 50%. since we are overlapping 50%,
%we will have a temporal resolution of 0.5sec.
%(noverlap=0, the temporal resolution = 1 sec because of
%the nfft).
[s,f,tpsd,p] = spectrogram(x,w,noverlap,nfft,fs);
pmean=mean(p,2); %spectral Averaging
grms=cumtrapz(f,pmean); %Parseval's Theorem
grms=(grms.^0.5);
%% Plotting
subplot(3,1,1)
plot(t,x);
ylabel('Accel (g)')
subplot(3,1,2)
plot(f,pmean)
set(gca,'xscale','log')
set(gca,'yscale','log')
ylabel('PSD (g_r_m_s^2/Hz)')
subplot(3,1,3)
plot(f,grms)
set(gca,'xscale','log')
xlabel('Frequency (Hz)')
ylabel('Cummulative RMS (g_r_m_s)')
%% Make a spectrogram
figure
spectrogram(x,w,noverlap,nfft,fs);
The averaging:
The two graphs Sandy created use Matlab not Octave. Right now this paper depends on Matlab. That's fine.
I'm fine with the moving average too. Just as long as we say that is what we did. But after reading more, I don't think de-noising this is trivial. There are lots of methods and lots of things you sacrifice with each method.
I was wrong about the periodogram. It is different that a spectrogram (which does it over intervals like you've shown). Here is a python version I found: http://matplotlib.org/api/mlab_api.html#matplotlib.mlab.specgram
This is what I got running a a variation of Brad's code on our data:
Code is in the issue-77 branch.
It sure does smooth the hell out of it. Maybe I'm doing something wrong.
Looks like you did the spectrum on 480 pieces of 1 second each. A spectrum on a 1-second long signal has a resolution of 1 Hz. That's too coarse. We want something like 0.1 Hz resolution. So make 48 pieces of 10 seconds, do the spectrum on each, and average those.
Here is 48 sections of 1000 samples each.
Looks better and quite smooth.
That's very nice and also consistent with the moving-averaged full-resolution spectrum. I would suggest to not use the logarithmic vertical axis, to show more clearly which frequencies were in the signal and which weren't. That's how Sandy had it originally. The high-frequencies get too much attention on the log scale.
@spinningplates @moorepants you guys can decide who does the code for the Figure that goes in the paper.
Here is the plot with no log scales.
Now the peak is narrow, which is good, but the frequency resolution should be better to show the peak well (right now it looks like 1 data point defining the peak).
So, maybe 20 seconds, or 40 seconds pieces?
20 second sections: 40 second sections:
20 seconds has my vote.
@spinningplates Do you want to use Brad's code to change your plots? You can use the code I have in branch issue-77 for reference on the numbers I used or you can modify it: https://github.com/csu-hmc/perturbed-data-paper/blob/issue-77/src/treadmill_motion.py
Yes, I will try to use these suggestions to update the plots.
PR #98 closes this.
@spinningplates @moorepants I'm having trouble downloading the data (stalls before it finishes, on my 4th attempt now) so I could not play with this. I was able to look at the code and the graphs that were there.
The spectra give some idea of the frequency content but I think they should be more smooth.
Because of the long duration of the signal, you get an extremely high resolution in the frequency domain. A signal of 10000 seconds has a resolution of 1/10000 Hz. You don't need such a high resolution. We don't need to know if the signal strength of the 1 Hz component of the signal is different from the 1.0001 Hz component.
A few things I would try:
You could use the current method and then smooth the spectrum before plotting it, I'm not sure if that's considered valid.