Open Michael-EV opened 8 years ago
@polyfractal any thoughts?
Could you paste the numerical output of both? How different are the values? It could just be due to floating-point rounding error, and the rounding you're doing at the end. Or maybe a bug on our end :)
Related, I'm looking at the movingstd()
function, it looks to be calculating the unbiased sample variance (e.g. denominator is n-1
)... I'm not sure why I did it that way. This is the complete population, so it should really just be dividing by n
.
Also also, we should probably just use mathjs for all the math operations, would be simpler and provides more features.
Sure!
Here are the numbers I am pulling from:
FB_data <- c(108.48, 111.56, 112.13, 113.75, 114.25, 110.79, 111.21, 116.82, 117.16, 120.38, 116.96, 119.56, 118.97, 117.54, 114.42, 111.01, 114.20, 116.43, 117.74, 119.90, 124.65, 124.98, 124.70, 123.60, 124.5, 126.85)
Here are is the .movingstd() output from Timelion:
FB_Timelion_SD <- c(5.58, 5.40, 5.46, 5.51, 5.87, 6.13, 6.31, 5.63, 5.61, 5.57, 5.60, 5.76, 6.32, 6, 5.8, 5.93, 5.79, 6.29, 6.31, 5.94, 5.58, 5.61)
And here is the output from my mov_sd function:
mov_sd <- (2.28, 1.46, 1.53, 2.46, 3.00, 4.14, 3.31, 1.67, 1.50, 1.41, 2.01, 3.56, 3.12, 2.50, 2.56, 3.41, 3.97, 3.92, 3.35, 2.12, 0.56, 1.25)
REMINDER: I am using the formula detailed in my original submissions. Counting in R vectors starts at 1, not 0.
Timelion's Moving Sd Graphed
My Moving Sd Graphed
The difference seems a bit too large to be a rounding error...maybe the data displayed on Timelion's interface is not the actual data being used in .movingstd()?
Hm, not sure. Definitely too big of a difference to be rounding error given those numbers (e.g. not super large or super small, so no floating point trickery going on).
Busting out good ol' excel, I can confirm your R findings that the Timelion values are definitely wrong:
I'm not sure what's going on here, and the map/combine/chain javascript shenangins aren't my forte (would prefer if these were old fashioned loops).
I don't really have time to debug it (I'm not really involved with Timelion, this was just a one-off for a different project that I contributed). Perhaps someone else could pick it up? It's probably something silly with how I structured the slice/map/reduce stuff
It seems that the issue is, as @polyfractal hinted, in the reduce function, here: movingstd.js:L38. I don't understand why it isn't working.
I haven't set up the development environment with node, in order to fully debug the issue. At the moment I'm a bit busy with work, and I still have to read all the contribution guidelines++, to submit a PR to kibana's master branch. But I have a solution to the bug with the following code: movingstd2.txt.
What I did was to use movingaverage.js as a template, and add just some few lines to the toPoint
fuction:
var variance = _.chain(pairSlice).map(function (point) {
return Math.pow(point[1] - average,2);
}).reduce(function (memo, num) {
return memo + num;
}).value() / (_window - 1);
That is to say, I moved the bit where one subtracts the average to each point and squares, into the map function, and just do a collect in the reduce part. And I also added the option to choose where to place the window slice (left, center, right), since this has to fit with how one takes the moving average too.
To test the code, I just:
Then one can test with the following:
1d
as time interval:
.quandl(WIKI/FB),.quandl(WIKI/FB).movingstd(6).yaxis(2), .quandl(WIKI/FB).movingstd2(6,left).yaxis(2)
Then you should get the following figure:
If we calculate manually what the results should have been and compare to our new mvstd2 function:
{Vector of values} sample-std mvstd2
{120.44 119.13 119.13 116.73 117.20 116.84 } 1.53269 1.53
{119.13 119.13 116.73 117.20 116.84 118.39} 1.11788 1.12
{119.13 116.73 117.20 116.84 118.39 118.39 } 0.9888 0.99
where I used wolfram alpha's calculator to calculate the sample-std values, and mvstd2 are the results from timelion, which are showed in green in the figure above. Then we see that our new function shows the correct values, and the actual mvstd
from timelion is way off (in Red).
I kept the unbiased standard deviation (aka sample standard deviation) which divides by N-1
using Bessel's correction. It seems this is the correct way to calculate the moving standard deviation, since we can not use the population mean (for most cases our data is changing all the time).
If we now look at the last 3 months, we can see from the following figure that mvstd2
(in green) really follows the change in the data, but mvstd
(in red) looks very flat at the beginning and doesn't really reflects the changes in the data:
Let me know what you think. I'll try to submit a PR on the weekend, if time allows.
Hello!
After noticing that .movingstd() does not have a 'position' option like .movingaverage(), I decided to look at the source code to see what the default position is for .movingstd(). Ater determining that it is 'left' I decided to compare the values it spits out to the actual moving standard deviation (which I computed in R -- code is at the bottom -- it is worth nothing that my values for moving average were the same as Timelion's .movingaverage()). I noticed that my sd values were not the same as Timelion's whether I used a population or sample standard deviation. However, the graph for my moving standard deviation and Timelion's .movingstd() were roughly the same shape.
Thanks.
My Timelion query: ".quandl(WIKI/FB), .quandl(WIKI/FB).movingstd(5), .quandl(WIKI/FB).movingaverage(5)", with time interval set to '1w' and time frame set to '6 months'
Here is my code for moving standard deviation in R:
`FB_data <- c(110.05, 108.48, 111.56, 112.13, 113.75, 114.25, 110.79, 111.21, 116.82, 117.16, 120.38, 116.96, 119.56, 118.97, 117.54, 114.42, 111.01, 114.20, 116.43, 117.74, 119.90, 124.65, 124.98, 124.70, 123.60, 124.05) ## this was taken from Timelion
mov_sd <- c() for(i in 1:(length(FB_data) - 5)) { mov_sd <- c(mov_sd, sd(FB_data[i:(i+4)])); ## take sd of FB_data[1:5], then FB_data[2:6], etc... }
mov_sd <- round(mov_sd, digits=2) plot(mov_sd, type = 'l') ## graph to compare to Timelion .movingstd() -- very similar shape `