OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.26k stars 1.48k forks source link

100% CPU usage upon `import numpy` #4365

Closed AyrtonRicardo closed 8 months ago

AyrtonRicardo commented 9 months ago

When doing simple import numpy as np the cpu goes to 100% usage in 7 out of 8 cores available.

Environment Fresh install Ubuntu 22.04 arm64 python3.11

Python and NumPy Versions:

>>> import sys, numpy; print(numpy.version); print(sys.version)
1.26.0
3.11.7 (main, Dec 8 2023, 18:56:58) [GCC 11.4.0]

Runtime environment

import numpy; print(numpy.show_runtime())
[{'numpy_version': '1.26.0',
'python': '3.11.7 (main, Dec 8 2023, 18:56:58) [GCC 11.4.0]',
'uname': uname_result(system='Linux', node='localhost', release='3.10.61', version='https://github.com/numpy/numpy/pull/1 SMP PREEMPT Tue Dec 5 22:24:50 UTC 2023', machine='aarch64')},
{'simd_extensions': {'baseline': ['NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD'],
'found': [],
'not_found': ['ASIMDHP', 'ASIMDFHM']}},
{'architecture': 'armv8',
'filepath': '/home/android/hass/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-17488984.3.23.dev.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23.dev'}]
None

Reproducing video can be found at: https://www.youtube.com/watch?v=cba5ntdMRkM

To solve my problem I've installed ATLAS lib, however I do still think I should report the issue in case someone faces the same issue as I did and to make you aware of the issue.

rgommers commented 9 months ago

@AyrtonRicardo what I did not get from your video is whether the CPU usage going to 100% is a transient startup effect, or whether the utilization stays that high. I didn't see it go down in the video, but may have missed it.

AyrtonRicardo commented 9 months ago

@AyrtonRicardo what I did not get from your video is whether the CPU usage going to 100% is a transient startup effect, or whether the utilization stays that high. I didn't see it go down in the video, but may have missed it.

Thanks for the reply, it stays like this(100%) as long as the script is running. It went down as soon as I closed the python script I was simulating, you can see that here I was barely using any cpu, I change threads to use only 3, after execute again the script (import numpy), the cpu goes up to 100% in 2 out of 3 cores.

Here is an example over around 1h of this running, avg of 86% while my application was idle. Also comparing to when I applied cpulimit to it with --limit 200(around 35% each core) image *Up and down because I was testing enabling/disabling cpulimit over the process.

That's currently same application running with ATLAS: image *3 hours time window. Barely invoking any spyke (expected behaviour, since it's almost idle all the time)

martin-frbg commented 9 months ago

what is your hardware please ? just loading openblas by importing numpy should at most cause a short spike as threads get set up, it certainly shouldn't be busy waiting(?) like that

martin-frbg commented 9 months ago

also did you build openblas and/or numpy from source or use binary packages (from where?) ?

AyrtonRicardo commented 9 months ago

@martin-frbg not sure what other information do you need, so my full setup:

Rooted Galaxy Note 5 running Ubuntu 22.04 via Linux Deploy. without Android GUI:

hwinfo:

hwinfo --short
cpu:
                       AArch64 Processor rev 0 (aarch64)
                       AArch64 Processor rev 0 (aarch64)
                       AArch64 Processor rev 0 (aarch64)
                       AArch64 Processor rev 0 (aarch64)
                       AArch64 Processor rev 0 (aarch64)
                       AArch64 Processor rev 0 (aarch64)
                       AArch64 Processor rev 0 (aarch64)
                       AArch64 Processor rev 0 (aarch64)
keyboard:
  /dev/ram             serial console
network:
                       Broadcom Network controller
network interface:
  ip6tnl0              Network Interface
  ip_vti0              Network Interface
  rmnet4               Network Interface
  rmnet2               Network Interface
  rmnet3               Network Interface
  rmnet7               Network Interface
  rmnet5               Network Interface
  rmnet6               Network Interface
  rmnet1               Network Interface
  rmnet0               Network Interface
  lo                   Loopback network interface
  sit0                 Network Interface
  p2p0                 Ethernet network interface
  wlan0                WLAN network interface
disk:
  /dev/sda             SAMSUNG KLUBG4G1BE-E0B1
  /dev/sdb             SAMSUNG KLUBG4G1BE-E0B1
  /dev/sdc             SAMSUNG KLUBG4G1BE-E0B1
  /dev/sdd             SAMSUNG KLUBG4G1BE-E0B1
  /dev/vnswap0         Disk
partition:
  /dev/sda1            Partition
  /dev/sda2            Partition
  /dev/sda3            Partition
  /dev/sda4            Partition
  /dev/sda5            Partition
  /dev/sda6            Partition
  /dev/sda7            Partition
  /dev/sda8            Partition
  /dev/sda9            Partition
  /dev/sda10           Partition
  /dev/sda11           Partition
  /dev/sda12           Partition
  /dev/sda13           Partition
  /dev/sda14           Partition
  /dev/sda15           Partition
  /dev/sda16           Partition
  /dev/sda17           Partition
  /dev/sdd1            Partition
usb controller:
                       ARM USB controller
                       ARM USB controller
                       ARM USB controller
bridge:
                       Samsung Electronics PCI bridge
memory:
                       Main Memory

Full cpu detail:

```bash android@localhost:~/certbot$ hwinfo --cpu 01: None 00.0: 10103 CPU [Created at cpu.343] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: AArch64 Vendor: "ARM Limited" Model: 0.2.0 "AArch64 Processor rev 2 (aarch64)" Platform: "SAMSUNG Exynos7420" Features: fp,asimd,aes,pmull,sha1,sha2,crc32, Config Status: cfg=new, avail=yes, need=no, active=unknown 02: None 00.0: 10103 CPU [Created at cpu.343] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: AArch64 Vendor: "ARM Limited" Model: 0.2.0 "AArch64 Processor rev 2 (aarch64)" Platform: "SAMSUNG Exynos7420" Features: fp Config Status: cfg=new, avail=yes, need=no, active=unknown 03: None 00.0: 10103 CPU [Created at cpu.343] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: AArch64 Vendor: "ARM Limited" Model: 0.2.0 "AArch64 Processor rev 2 (aarch64)" Platform: "SAMSUNG Exynos7420" Features: fp Config Status: cfg=new, avail=yes, need=no, active=unknown 04: None 00.0: 10103 CPU [Created at cpu.343] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: AArch64 Vendor: "ARM Limited" Model: 0.2.0 "AArch64 Processor rev 2 (aarch64)" Platform: "SAMSUNG Exynos7420" Features: fp Config Status: cfg=new, avail=yes, need=no, active=unknown 05: None 00.0: 10103 CPU [Created at cpu.343] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: AArch64 Vendor: "ARM Limited" Model: 0.2.0 "AArch64 Processor rev 2 (aarch64)" Platform: "SAMSUNG Exynos7420" Features: fp Config Status: cfg=new, avail=yes, need=no, active=unknown 06: None 00.0: 10103 CPU [Created at cpu.343] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: AArch64 Vendor: "ARM Limited" Model: 0.2.0 "AArch64 Processor rev 2 (aarch64)" Platform: "SAMSUNG Exynos7420" Features: fp Config Status: cfg=new, avail=yes, need=no, active=unknown 07: None 00.0: 10103 CPU [Created at cpu.343] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: AArch64 Vendor: "ARM Limited" Model: 0.2.0 "AArch64 Processor rev 2 (aarch64)" Platform: "SAMSUNG Exynos7420" Features: fp Config Status: cfg=new, avail=yes, need=no, active=unknown 08: None 00.0: 10103 CPU [Created at cpu.343] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: AArch64 Vendor: "ARM Limited" Model: 0.2.0 "AArch64 Processor rev 2 (aarch64)" Platform: "SAMSUNG Exynos7420" Features: fp Config Status: cfg=new, avail=yes, need=no, active=unknown android@localhost:~/certbot$ ```

also did you build openblas and/or numpy from source or use binary packages (from where?) ?

Only executed pip install numpy==1.26.0 (version required by Home Assistant).

martin-frbg commented 9 months ago

Hmm. Not that special.. I do not have a rooted phone for this but I can see if I can test in termux. I rather suspect this is a build error of some sort in the pip-packaged openblas binary (from one of the related tickets, it appears to be DYNAMIC_ARCH - which certainly makes sense - but with the build target for the common code parts left at NeoverseN1 (probably its build host) which might have led the compiler to choose some unfortunate optimizations.

martin-frbg commented 8 months ago

Not reproducible on Odroid C4 (4x CortexA55) running Ubuntu 22.04LTS (Python 3.10.12 which is the latest offered by apt)

martin-frbg commented 8 months ago

also not reproducible on same system with python 3.11 (after figuring out deadsnakes repo & getting a working pip 3.11)

martin-frbg commented 8 months ago

Unable to test under Android/termux as this ends up being a numpy build from source, which fails (apparently due to various idiosyncrasies of the platform). I see no reason to expect any different results from that A53/A57 combo compared to the A55 in any case, as none of the cpu-specific BLAS kernel code would get invoked just from loading the library.

martin-frbg commented 8 months ago

A CI run on NeoverseN1 also shows no signs of excessive cpu usage upon loading numpy 1.26 into python 3.11.7 (using the official Python Docker image with 8 cpus enabled in the CI configuration, actual image downloaded by pip was numpy-1.26.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl ). Also checked that an OpenBLAS built with TARGET set to NEOVERSEN1 in the DYNAMIC_ARCH configuration would still load normally on CortexA57 (even if not from python/numpy, but dynamically linked against a trivial C program).

Intending to close as there is nothing to go by, certainly nothing to suggest OpenBLAS or the way it was built by PyPi packagers is responsible for the phenomenon reported.

brokeDude2901 commented 2 months ago

OPENBLAS_NUM_THREADS=1 seem to fix the issue

rgommers commented 2 months ago

@brokeDude2901 if you have the same issue, can you please add the exact CPU details of your machine, as well as how you installed numpy and the output of import numpy; print(numpy.show_runtime())?