edwindj / ffbase

Basic (statistical) functionality for R package ff
github.com/edwindj/ffbase/wiki
35 stars 15 forks source link

Wrong results when binned_sumsq is applied on ff object #46

Open tanruixin opened 9 years ago

tanruixin commented 9 years ago

Hi, I am not familiar with Rcpp so please excuse me if I ask stupid question. I was trying to calculate variance of a variable through binned_sumsq. x is the variable name, it has 5000 observations. To make it simple, I set mean as 0 first. When I stored the data in a vector, it works:

binned_sumsq(x, mean = rep(0, 5000), nbins = 1, bin = rep(1, 5000))

Output: bin count sumsq 1 5000 14196053 0

But when I stored it as an ff object, it shows different results. If I run it repeatedly, the results are different.

y = as.ff(x)
binned_sumsq(y, mean = rep(0, 5000), nbins = 1, bin = rep(1, 5000))

Output(1st time): bin count sumsq 1 73 212576 0 Output(2nd time): bin count sumsq 1 80 211311 0

Eventually, I need to apply it on large dataset, so I need to store data in ff object. Did I call the function in a wrong way? Thank you.