giampaolo / psutil

Cross-platform lib for process and system monitoring in Python
BSD 3-Clause "New" or "Revised" License
10.25k stars 1.38k forks source link

[CentOS] cpu_freq OSError: [Errno 16] Device or resource busy #2261

Open IgorPelevanyuk opened 1 year ago

IgorPelevanyuk commented 1 year ago

Summary

Description

Hello! First of all, thank youo for the great tool you are developing. I have very strange issue with psutil library. It breaks during high load on the server. Always during max_ = int(bcat(pjoin(path, "scaling_max_freq"))) / 1000 The error is the following:

Traceback (most recent call last):
  File "/zfs/tmp/dirac/DIRAC_q06kKSpilot/job/Wrapper/Wrapper_3029060", line 179, in execute
    result = job.execute()
  File "/zfs/tmp/dirac/DIRAC_q06kKSpilot/diracos/lib/python3.9/site-packages/DIRAC/WorkloadManagementSystem/JobWrapper/JobWrapper.py", line 435, in execute
    watchdog.calibrate()
  File "/zfs/tmp/dirac/DIRAC_q06kKSpilot/diracos/lib/python3.9/site-packages/DIRAC/WorkloadManagementSystem/JobWrapper/Watchdog.py", line 791, in calibrate
    result = self.getNodeInformation()
  File "/zfs/tmp/dirac/DIRAC_q06kKSpilot/diracos/lib/python3.9/site-packages/DIRAC/WorkloadManagementSystem/JobWrapper/Watchdog.py", line 973, in getNodeInformation
    result["CPU(MHz)"] = psutil.cpu_freq()[0]
  File "/zfs/tmp/dirac/DIRAC_q06kKSpilot/diracos/lib/python3.9/site-packages/psutil/__init__.py", line 1864, in cpu_freq
    ret = _psplatform.cpu_freq()
  File "/zfs/tmp/dirac/DIRAC_q06kKSpilot/diracos/lib/python3.9/site-packages/psutil/_pslinux.py", line 745, in cpu_freq
    max_ = int(bcat(pjoin(path, "scaling_max_freq"))) / 1000
  File "/zfs/tmp/dirac/DIRAC_q06kKSpilot/diracos/lib/python3.9/site-packages/psutil/_common.py", line 776, in bcat
    return cat(fname, fallback=fallback, _open=open_binary)
  File "/zfs/tmp/dirac/DIRAC_q06kKSpilot/diracos/lib/python3.9/site-packages/psutil/_common.py", line 765, in cat
    return f.read()
OSError: [Errno 16] Device or resource busy

CPU frequency information exists at the following path: /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq

Caching method is not working since different processes running cpu_freq function. It happens not always. Do you know what can it be?

giampaolo commented 1 year ago

Mmm.. are there some network partitions / folders involved?

IgorPelevanyuk commented 1 year ago

Yes, there are some network folders on the server. But the code reads data from /sys/... which is 100% local. ZFS folder on which psutil code is running is also placed on a local disks.

giampaolo commented 1 year ago

I guess you have no way to reliably reproduce this, correct? This is similar to https://github.com/giampaolo/psutil/issues/2250#issuecomment-1529071619: we may retry read() on EBUSY for a certain number of times (say 10), then give up, even though it's not really a proper solution.

IgorPelevanyuk commented 1 year ago

Well, we spent some more time to play with tests and different hypothesis. So far, it looks like the problem is not related to amount of opened files. And the issue could be the following: During change from no load to high load linux kernel start to change values of /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq more frequently. And during this change the file is not available for reading by psutil. Right now we are looking for a way to make scaling_max_frequency static to test this sypothesis.

If I understand right, your proposal about EBUSY exception could be used in some future versions of psutils. And right now we can not "activate" it?

If you know some hack how to fix scaling_max_frequency, we would gladly use it. Because the value of CPU frequency is not critical for the system that uses psutil.