ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

KeyError: `Grid_Size' when filtering for invalid device or kernel #294

Closed JoseSantosAMD closed 3 weeks ago

JoseSantosAMD commented 8 months ago

Describe the bug In an attempt to break Omniperf I found some commands that result in a Grid_Size error shows up when filtering for out of range or invalid device/kernel

Development Environment:

  1. See error

Picture2

Expected behavior Graceful exit with helpful output

skyreflectedinmirrors commented 8 months ago

ah, the dataframes are entirely empty by the time we read them here: https://github.com/AMDResearch/omniperf/blob/4a8917f8802dabe995294390730796fcddfe3017/src/omniperf_profile/profiler_base.py#L113

probably just need a check for that and helpful error message as Jose suggests

efaulhaber commented 3 months ago

I get this error without --dispatch or --kernel specified:


$ omniperf profile -n wcsph -- ~/.juliaup/bin/julia --project=run ./benchmarks/gpu.jl

  ___                  _                  __ 
 / _ \ _ __ ___  _ __ (_)_ __   ___ _ __ / _|
| | | | '_ ` _ \| '_ \| | '_ \ / _ \ '__| |_ 
| |_| | | | | | | | | | | |_) |  __/ |  |  _|
 \___/|_| |_| |_|_| |_|_| .__/ \___|_|  |_|  
                        |_|                  

   INFO Omniperf version: 2.0.1
   INFO Profiler choice: rocprofv1
   INFO Path: /home/efaulha2/git/PointNeighbors.jl/workloads/wcsph/MI200
   INFO Target: MI200
   INFO Command: /home/efaulha2/.juliaup/bin/julia --project=run ./benchmarks/gpu.jl
   INFO Kernel Selection: None
   INFO Dispatch Selection: None
   INFO Hardware Blocks: All
   INFO 
   INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   INFO Collecting Performance Counters
   INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   [...]

   INFO    |-> [rocprof] File '/home/efaulha2/git/PointNeighbors.jl/workloads/wcsph/MI200/timestamps.csv' is generating
   INFO    |-> [rocprof] 
Traceback (most recent call last):
  File "/home/efaulha2/omniperf/2.0.1/bin/omniperf", line 138, in <module>
    main()
  File "/home/efaulha2/omniperf/2.0.1/bin/omniperf", line 126, in main
    omniperf.run_profiler()
  File "/home/efaulha2/omniperf/2.0.1/libexec/omniperf/utils/utils.py", line 45, in wrap_function
    result = function(*args, **kwargs)
  File "/home/efaulha2/omniperf/2.0.1/libexec/omniperf/omniperf_base.py", line 229, in run_profiler
    profiler.post_processing()
  File "/home/efaulha2/omniperf/2.0.1/libexec/omniperf/utils/utils.py", line 45, in wrap_function
    result = function(*args, **kwargs)
  File "/home/efaulha2/omniperf/2.0.1/libexec/omniperf/omniperf_profile/profiler_rocprof_v1.py", line 85, in post_processing
    self.join_prof()
  File "/home/efaulha2/omniperf/2.0.1/libexec/omniperf/utils/utils.py", line 45, in wrap_function
    result = function(*args, **kwargs)
  File "/home/efaulha2/omniperf/2.0.1/libexec/omniperf/omniperf_profile/profiler_base.py", line 124, in join_prof
    key = _df.groupby(["Kernel_Name", "Grid_Size"]).cumcount()
  File "/home/efaulha2/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 9183, in groupby
    return DataFrameGroupBy(
  File "/home/efaulha2/.local/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1329, in __init__
    grouper, exclusions, obj = get_grouper(
  File "/home/efaulha2/.local/lib/python3.9/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
    raise KeyError(gpr)
KeyError: 'Grid_Size'
ppanchad-amd commented 3 weeks ago

@JoseSantosAMD Issue is fixed. Closing ticket. Thanks!