OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.26k stars 1.48k forks source link

ARM results in error #647

Open culurciello opened 8 years ago

culurciello commented 8 years ago

Dear developers, thank you for your great work on openBLAS.

using it on ARM 32 bit platforms and Ubuntu 14.04, we found some erroneous results used with Torch7:

The Lua code below should always give 1 as the result. On ARM it gives random numbers, if compiled with OpenMP (and 0.99999999993838 if compiled without).

require 'nn'

torch.setdefaulttensortype('torch.FloatTensor')
data = torch.Tensor(4, 58, 58)
for i = 1,4 do
 for j = 1,58 do
   for k = 1,58 do
     data[i][j][k] = i+j+k  
   end
 end
end
n = nn.Sequential()
n:add( nn.SpatialConvolutionMM(4, 64, 5, 5, 1, 1) )
n.modules[1].weight = torch.Tensor(64,100)
for i = 1,100 do
 n.modules[1].weight[1][i] = i
end
n.modules[1].bias = torch.Tensor(64)

n2 = nn.Sequential()
n2:add( nn.SpatialConvolutionMM(64, 64, 5, 5, 1, 1) )
n2.modules[1].weight = torch.Tensor(64,1600)
for i = 1,1600 do
 n2.modules[1].weight[1][i] = i
end
n2.modules[1].bias = torch.Tensor(64)

data = n:forward(data)
data = n2:forward(data)
out = 0
for i = 1,50 do
for j = 1,50 do
  out = out + data[1][i][j]
end
end
print(out/259643747536)
xianyi commented 8 years ago

@culurciello , which kernel do you use? Now, OpenBLAS only supports ARM hard FP ABI. Is it possible an ABI issue?

mvitez commented 8 years ago

I am working with @culurciello. We use the Odroid U3 and XU3. They use the hard FP ABI. This problem has been present one year ago and it's still present, we have switched various kernels and OpenBLAS versions. I have tried to write a simple C program that shows this defect, but unfortunately I did not succeed. This problem only appears in complex environments, but by printing intermediate results I found that the errors in calculations come from OpenBLAS. Thank you.

xianyi commented 8 years ago

@mvitez , could you try export OMP_NUM_THREADS=1? It looks like the application uses float, sgemm. Am I right?

mvitez commented 8 years ago

It works correctly with only one thread. We actually make OpenBLAS without NO_AFFINITY=1 USE_OPENMP=1 as we should and in such case it works with some limitations, but without errors, besides some segmentation faults, which are quite rare fortunately.

The applications uses float, sgemm, you are right.

martin-frbg commented 6 years ago

This old issue will hopefully have been fixed by the several rounds of thread safety improvements after about december 2016.

martin-frbg commented 6 years ago

Actually still same results unfortunately (though the OPENMP build seems to give "correct" results of the 0.99999...38 type with OMP_NUM_THREADS=2 as well, on a quad-core Asus tinkerboard). The recently added NUM_PARALLEL option does not appear to have any effect either. Not sure how to debug this, as both helgrind and tsan do not work well with OpenMP.

martin-frbg commented 6 years ago

Switching to USE_SIMPLE_THREADED_LEVEL3 "solves" it however.

martin-frbg commented 5 years ago

This appears to have been fixed in the meantime (to the extent that it now returns 0.99999..38 in every case), probably by the correction for #1851 that went into 0.3.4 already.