flatironinstitute / NoRMCorre

Matlab routines for online non-rigid motion correction of calcium imaging data
GNU General Public License v2.0
142 stars 88 forks source link

Motion_metrics giving different results for same data set #18

Closed vd2309 closed 6 years ago

vd2309 commented 6 years ago

Hello,

We are using NormCorre and are checking our results on different computers. Each time we use a different computer with the same exact data set we are getting different results. We have identified the issue as a problem with the corr function. According to our debugging, the corr(X,Y) function gives very different results when you loop through each column of the matrix i.e. call the corr function for each column and calculate corr(X(:,i), Y) at each column of X, versus when you just call the corr function once on the entire X matrix and Y.

We believe that this problem is what is leading to the different results for the same data set. We also believe that this problem is present in MATLAB versions of 2017a and previous versions. 2017b and beyond seem to be okay.

Have you experienced this problem and are there any solutions? Or did you already know this about the corr function and specifically chose to call it once on the entire matrix instead of looping through?

Thank you very much for you help!

Ginny Richard Axel Laboratory

epnev commented 6 years ago

@vd2309 I'm not sure this is the case. I just tried the following code

X = randn(1000,100);
y = randn(1000,1);
c1 = corr(X,y);
c2 = zeros(100,1);
for i = 1:100
    c2(i) = corr(X(:,i),y);
end

max(abs(c1-c2))

in Matlab 2017a and 2017b and in both cases the results are identical up to machine precision. How big are the differences you're seeing?

vd2309 commented 6 years ago

Hi, Eftychios

We have done some debugging and here is what we have found. Try running the code below on matlab 2018a and matlab 2015a. The difference between the two versions of corr function (ie matrix vs loop) in the 64 bit 2018a is order of 10^-7, while the difference on matlab 64 bit 2015a is -.0114 - a much larger difference.

temp = sin((1:1e6)/60);
X = repmat(.1*temp',1,100) + .01*randn(1e6,100) + 200;
X(1:1e5,:) = zeros(1e5,100);
X = single(X);
y = nanmean(X,2);
c1 = corr(X,y);

c2 = [];
for i = 1:100
    c2(i) = corr(X(:,i),y);
end

disp('diff')
mean(c1 - c2')

What we have concluded is that large, single precision, sparse matrices yield different results in the 2 versions of corr (ie the version where the matrix is the input versus the version where we calculate corr column by column) when we run the 2 versions on older matlab releases. We wanted to let you know because this may affect NORMCORRE users who are using older versions of matlab. Please confirm that you also are getting the same results as we did on the 2015a and 2018a matlab releases if you are able, and please advise.

Thanks very much! Ginny Richard Axel Laboratory

epnev commented 6 years ago

@vd2309 Thanks, I was able to reproduce that discrepancy between versions 2017a and 2017b. The problem is due to the single precision. I created an entry in the wiki pointing this out, thanks for noticing. It can be fixed by introducing an if command that checks versions but I'd rather not do it to avoid memory overhead.

In your case, I noticed that the data you created has the its 10% entries set to zero. This affects the numerical precision. In motion metrics you can use the bnd variable to determine the number of pixels that you want to trim from the boundaries. I suggest using that since it will give you more informative metrics. For example, if I remove these zeros from the calculation then the discrepancy between the two measures falls to 10^-5.