Granulate / gprofiler

gProfiler is a system-wide profiler, combining multiple sampling profilers to produce unified visualization of what your CPU is spending time on.
https://profiler.granulate.io
Apache License 2.0
743 stars 54 forks source link

Gracefully handle various errors as NoSuchProcess #797

Closed Jongy closed 1 year ago

Jongy commented 1 year ago
Traceback (most recent call last):
File gprofiler/profilers/java.py, line 321, in make_application_metadata
File gprofiler/profilers/java.py, line 336, in get_jvm_flags_serialized
File gprofiler/profilers/java.py, line 342, in get_jvm_flags
File gprofiler/profilers/java.py, line 391, in get_supported_jvm_flags
File gprofiler/profilers/java.py, line 239, in run
File gprofiler/utils/__init__.py, line 300, in run_process
gprofiler.exceptions.CalledProcessError: Command ['/tmp/_MEIagB051/gprofiler/resources/java/asprof', 'jcmd', '--jattach-cmd', 'VM.flags -all', '12156'] returned non-zero exit status 1.
stdout:
stderr: Process 12156 not found

If we run jcmd and it exits with the message Process {pid} not found, and the PID is indeed dead, raise NoSuchProcess instead of this generic exception (so it's handled gracefully).

Jongy commented 1 year ago

Similar:

Traceback (most recent call last):
File gprofiler/metadata/application_metadata.py, line 52, in get_metadata
File gprofiler/profilers/ruby.py, line 45, in make_application_metadata
File gprofiler/profilers/ruby.py, line 40, in _get_ruby_version
File gprofiler/metadata/application_metadata.py, line 40, in get_exe_version
File gprofiler/metadata/versions.py, line 29, in get_exe_version
File granulate_utils/linux/ns.py, line 259, in run_in_ns
File granulate_utils/linux/ns.py, line 247, in _switch_and_run
File gprofiler/metadata/versions.py, line 27, in _run_get_version
File gprofiler/utils/__init__.py, line 250, in run_process
File gprofiler/utils/__init__.py, line 149, in start_process
File subprocess.py, line 971, in __init__
File subprocess.py, line 1847, in _execute_child
FileNotFoundError: [Errno 2] No such file or directory: '/proc/8091/exe'

in Ruby _get_ruby_version (or rather, can be handle in get_exe_version).

Jongy commented 1 year ago
Traceback (most recent call last):
File gprofiler/metadata/application_metadata.py, line 52, in get_metadata
File gprofiler/profilers/java.py, line 316, in make_application_metadata
File granulate_utils/linux/process.py, line 75, in get_mapped_dso_elf_id
File granulate_utils/linux/elf.py, line 42, in inner
File granulate_utils/linux/elf.py, line 28, in inner
File granulate_utils/linux/elf.py, line 98, in get_elf_id
File contextlib.py, line 135, in __enter__
File granulate_utils/linux/elf.py, line 62, in open_elf
FileNotFoundError: [Errno 2] No such file or directory: '/proc/12460/root//usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/amd64/server/libjvm.so'
Jongy commented 1 year ago

The idea is - if the process truly just went down, gProfiler handles it gracefully and doesn't log a nasty error log.

mpozniak95 commented 1 year ago

Btw. this error:

Traceback (most recent call last):
File gprofiler/metadata/application_metadata.py, line 52, in get_metadata
File gprofiler/profilers/java.py, line 316, in make_application_metadata
File granulate_utils/linux/process.py, line 75, in get_mapped_dso_elf_id
File granulate_utils/linux/elf.py, line 42, in inner
File granulate_utils/linux/elf.py, line 28, in inner
File granulate_utils/linux/elf.py, line 98, in get_elf_id
File contextlib.py, line 135, in __enter__
File granulate_utils/linux/elf.py, line 62, in open_elf
FileNotFoundError: [Errno 2] No such file or directory: '/proc/12460/root//usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64/jre/lib/amd64/server/libjvm.so'

should already be handled by: https://github.com/Granulate/granulate-utils/commit/724d3e82446b172386fb397c446b250677dd8112 I checked manually if it works and it worked on my env. I guess your pid "12460" was alive but I am not sure what happened then. Maybe PID re-usage?

Jongy commented 1 year ago

I checked manually if it works and it worked on my env. I guess your pid "12460" was alive but I am not sure what happened then. Maybe PID re-usage?

Perhaps PID reuse and perhaps the file was deleted (while being loaded in the maps of the process), for example due to JDK upgrade.

I am closing the issue now as 2 cases were solved in #803 and the last one, we're not sure anymore if relevant or not. Will re-open if it is..