Closed sglyon closed 10 years ago
Cool. I suspect that John will not mind us using this code under a reasonable (e.g. MIT) license.
I already got John's ok when I translated his code a while back. Will send a link soon.
-- John
On Oct 11, 2014, at 11:18 AM, Stefan Karpinski notifications@github.com wrote:
Cool. I suspect that John will not mind us using this code under a reasonable (e.g. MIT) license.
— Reply to this email directly or view it on GitHub.
Here's the original translation that John approved: https://github.com/johnmyleswhite/RunningStats.jl
I believe John said his code was under the public domain. Trying to find his e-mail.
Haha nice, should have known somebody has already done this!
I think it'd be good to make a clean repository for streaming analytics. We could pull in John Cook's code, some reservoir sampling code I have, HyperLogLog and a few others.
Probably best to do this in a separate repo so we can keep StatsBase minimal.
Sounds good, closing.
Feel free to post a link to the streaming stats repo when it is settled.
StreamStats would be a good package name. Having a good single-pass quantile estimator would be nice too. There are some quite good algorithms out there.
On Saturday, October 11, 2014, Stefan Karpinski notifications@github.com wrote:
Having a good single-pass quantile estimator would be nice too. There are some quite good algorithms out there.
Why not even make if more general and allow the desired percentile to be a parameter?
I'm pretty sure that's possible with the probabilistic data structures Stefan has in mind.
I have implemented some Julia code to compute the first four moments of an array (right now just
Array{T, 1}
) with just one pass. This code is pretty much Julia code based on the C++ version here.This code also allows users to compute "running" versions of these moments. This means that if users need to track moments for data that is becoming available in smaller chunks, no computation will be repeated as new data is added to the sample. I did some testing against the code here and these are my results:
We can also see that adding data little by little results in the correct moments (this runs without any issues):
My question is if this code would belong in
src/moments.jl
in this package? Right now I have it sitting in a repo where I keep useful stuff that doesn't belong anywhere else. Some tests can be found here.Let me know if anyone things this belongs here. If it does, we will probably need to ask John Cook for permission before including it, as I am not sure what license is attached to the content of his blog (link from above).