OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.39k stars 1.5k forks source link

Unstable run time of cblas_sgemm #1200

Closed kindloaf closed 7 years ago

kindloaf commented 7 years ago

Hi, I am testing the run time for cblas_sgemm. I'm using a 6-core ARM CPU. To get the run time, I ran 1000 times of cblas_sgemm with the same arguments. Surprisingly, the average run times varied greatly between runs. For a matrix multiplication with m*n*k=~30M, the average run times could range between 1.5ms to 9.6 ms. Is the run time fluctuation reasonable?

martin-frbg commented 7 years ago

Some of the more obvious things to check: is cpu clockspeed constant, or subject to thermal throttling ? Are all the cores of the same type, or is this some kind of big-little system where a thread may end up on one of the less powerful cores once in a while ? how many cores/threads are you using ? what else is going on on the system (and may e.g. flush a cpu cache occasionally ) ? is time granularity sufficient to measure runtimes in the millisecond range ? do you see similar variations with the reference BLAS from netlib ?
BTW if you can tell what type of ARM cpu (and which version of OpenBLAS) you use for this, perhaps someone can followup with specific suggestions - not all cpu TARGETs in OpenBLAS are equally well optimized.

kindloaf commented 7 years ago

Here are the environments: (1) The cpu clockspeed (frequency) is fixed, by setting minfrequency and maxfrequency of each core to the maximum value. (2) 2 cores are slightly better than the other 4 cores, but they have the same frequency. And by testing single-core performance, they are similar regarding cblas_sgemm. (3) I am using 6 threads, by setting OPENBLAS_NUM_THREADS=6. Observing through htop, I saw all 6 cores are at 100% I observed unstable run times when I tried 3,4,5,6 threads. Single thread or 2-thread are fine so far. (4) The system is doing nothing else but calculating cblas_sgemm. (5) I used gettimeofday before and after each sgemm. Each call to gettimeofday takes ~0.0001 millisecond. (6) I haven't tried netlib

brada4 commented 7 years ago

(1)Please modify governor to powersave to rule put thermal issues. (2) please measure with taskset how much is slightly (3) that is the big.little stuff - actually are 3 cores any faster than 2? (5) thats 0.1ms, clock_gettime() can reach higher resolution clocks. (7) does /proc/cpuinfo (attach if unsure) reflect variation in cores?

kindloaf commented 7 years ago

@brada4 I just figured out the issue - it's indeed due to big-little cores. When using only big cores or only little cores, the run times are much more stable. Thanks. By the way, for big-little CPUs, is the common practice for OpenBlas to use only big cores or only little cores?

brada4 commented 7 years ago

Can you detect big cores and measure/compare with cpuset to answer your question? I am quite weak at remote sensing.

kindloaf commented 7 years ago

Will do. Thanks.

brada4 commented 7 years ago

If you think there is stable way to detect big cores feel free to share best way (was not possible year ago)

kindloaf commented 7 years ago

@brada4 Here is how I do it: I'm working on a specific CPU, so I read the spec which says there are 2 big cores at certain frequency and 2 little cores at certain frequency. Then I check /proc/cpuinfo, 2 cores have the exact same description and the other 4 cores have the exact same description. So I assume the former 2 are big cores.

kindloaf commented 7 years ago

Also, when I ran OpenBlas with 2 threads, it always chose the two presumably "big" cores. When I used more than 2 threads, these 2 cores were always chosen each time, and the other cores were chosen randomly. This sort of verified my theory about which core is which.

brada4 commented 7 years ago

I am afraid this thoretizing is not helpful. At least cpuinfo or something substantial could help

ctgushiwei commented 7 years ago

can you share your makefile,i have some issues

kindloaf commented 7 years ago

@ctgushiwei I just used the default Makefile. What error did you see?

ctgushiwei commented 7 years ago

@kindloaf my cpuinfo is simalar to yours.what is the OpenBlas version dou you use? the develop version or the arm_soft_fp_abi version

kindloaf commented 7 years ago

@ctgushiwei I used the develop branch.

ctgushiwei commented 7 years ago

@kindloaf I can compiled 0.2.19 release version successfully on armv7 ,but when i test cblas_sgemm,i go to segmentation error. i have known the reason,the code at openblas_0.2.19/kernel/ can not compile to .o file,i do not know how to solve this problem. can you help me how to slove this problem or share your makefile and parameters passed to 'make' command