GPUOpen-Tools / radeon_developer_panel

The Radeon Developer Panel (RDP) is a software tool that allows users to capture RGP profiles, RMV traces, RRA scenes, and RGD crash analysis dumps on Radeon GPUs.
15 stars 6 forks source link

2.8.0.27 not capturing anything #25

Open MathiasMagnus opened 1 year ago

MathiasMagnus commented 1 year ago

This new version can finally connect to the service it needs to. When launching the application it greets with a warning that some features may not work, please run AddUserToGroup.bat script as admin. Running the script as admin, it has a similar localization issue, namely that it can't find the user Performance Log Users. Of course it can't, that name is not the name of the user, and at this point it starts to feel like we're running in circles. There are two very similar usernames, one related to performance monitoring (S-1-5-32-558) and one related to performance event logging (S-1-5-32-559). Please take a look at Well known SIDs and start using SIDs across the entire codebase, not the localized display strings.

Because the translation isn't perfect, I couldn't tell which one could Performance Log Users relate to, it very much seems like I want perf monitoring. Just to be on the safe side, I added myself to both groups.

Add-LocalGroupMember -Group 'Teljesítményfigyelő felhasználói' -Member mate
Add-LocalGroupMember -Group 'Teljesítménynapló felhasználói' -Member mate

Reboot the machine. Now when I ask whether I'm part of these groups, it seems like I am:

PS C:\Users\mate> Get-LocalGroupMember -SID S-1-5-32-558

ObjectClass Name               PrincipalSource
----------- ----               ---------------
User        MATTY-GA402RK\mate MicrosoftAccount

PS C:\Users\mate> Get-LocalGroupMember -SID S-1-5-32-559

ObjectClass Name               PrincipalSource
----------- ----               ---------------
User        MATTY-GA402RK\mate MicrosoftAccount

However, it doesn't matter if running in Basic mode, or adding my application in advanced mode, it can't profile the application. Profiling is set to profiling Compute and the first 10 dispatches: kép when I launch the application on the command-line, a green progress bar appears in the applications pane indicating that it's collecting a capture, but once the application shuts down, there's no saved file of any kind. (I'd be expecting a .rgp file, but the folder under C:\Users\mate\Documents\rgp_profiles\OpenCL-Cpp-Reduction-v2 is empty.) kép How can I go about finding out what's the issue? @mguerret-amd

========================================== Host System Information

RDP Version: Radeon Developer Panel v2.8.0.27 RDP build Date: 12/12/2022

Operating System: Windows 11 Version 22H2 Qt Version: 5.15.2

Tool Version: 2.0.0 Router Version 0.13.0

========================================== Connected System Information

Driver Version: 22.20.29.10-221130a-386220E-AMD-Software-Adrenalin-Edition GPUOpen Interface Major Version: 42

Operating System Name: Windows 11 Pro Operating System Description: 22621.1.amd64fre.ni_release.220506-1250

GPUs:


Name: AMD Radeon RX 6800S

ASIC info


Device Id: 29679 Revision: 192 Family: 143 gfx_engine: 13


Name: AMD Radeon(TM) Graphics

ASIC info


Device Id: 5761 Revision: 200 Family: 146 gfx_engine: 13

MathiasMagnus commented 1 year ago

I installed en-US localization just to make sure if it's related to that, and unfortunately no. No profiling occurs even when display language is set to en-US.

MathiasMagnus commented 1 year ago

The log file contains the following:

Log file created successfully: [C:/Users/mate/AppData/Roaming/RadeonDeveloperPanel/log.txt]
Initializing RDP v2.8.0.27
        System Info
                CPU Architecture    :                                   x86_64
                Kernel Type         :                                    winnt
                Kernel Version      :                               10.0.22621
                OS Name             :                  Windows 11 Version 22H2
                Host Name           :                            MATTY-GA402RK
        Qt Library Versions
                Compile-Time        :                                   5.15.2
                Run-Time            :                                   5.15.2
        DDTool Library Versions
                Compile-Time        :                                    2.0.0
                Link-Time           :                                    2.0.0
        DDRouter Library Versions
                Compile-Time        :                                   0.13.0
                Link-Time           :                                   0.13.0
Attempting to load config from default location: C:\Kellekek\AMD\RadeonDeveloperToolSuite-2023-01-05-1041\.devdriver\ddTool\
No config found, skipping creation and loading default config json
Loaded 1 file(s):
+ config.json - 40 bytes - C:\Kellekek\AMD\RadeonDeveloperToolSuite-2023-01-05-1041\.devdriver\ddTool\config.json
config.json version: 1
Found 0 module entries
Successfully parsed config json string
Successfully created tool context
Loading Modules
        Querying for Existing Modules
        Dynamic Modules: [0]
        Built-in Modules: [4]
                [0]: MemoryTrace
                        ddModuleLoader's Module API Version 1.18.0 | MemoryTrace's Module API Version 1.18.0
                        MemoryTrace's Module Version 0.8.0
                        Required MemoryTrace Module Extension API Version 0.13.0 | MemoryTrace Module Extension API Version 0.13.0 | Extension Id 0x7972756372656d
                        MercuryModuleExt GUI extension found! ((null):0, (null))
                [1]: Profiling
                        ddModuleLoader's Module API Version 1.18.0 | Profiling's Module API Version 1.18.0
                        Profiling's Module Version 0.4.0
                        Required Profiling Module Extension API Version 0.13.0 | Profiling Module Extension API Version 0.13.0 | Extension Id 0x7972756372656d
                        MercuryModuleExt GUI extension found! ((null):0, (null))
                [2]: DeviceClocks
                        ddModuleLoader's Module API Version 1.18.0 | DeviceClocks's Module API Version 1.18.0
                        DeviceClocks's Module Version 0.2.0
                        Required DeviceClocks Module Extension API Version 0.13.0 | DeviceClocks Module Extension API Version 0.13.0 | Extension Id 0x7972756372656d
                        MercuryModuleExt GUI extension found! ((null):0, (null))
                [3]: UberTrace
                        ddModuleLoader's Module API Version 1.18.0 | UberTrace's Module API Version 1.18.0
                        UberTrace's Module Version 0.2.0
                        Required UberTrace Module Extension API Version 0.13.0 | UberTrace Module Extension API Version 0.13.0 | Extension Id 0x7972756372656d
                        MercuryModuleExt GUI extension found! ((null):0, (null))
Loading default workflows
Loading settings group: [Workflows]
Loading settings group: [Managed Applications]
Saving settings group: [Managed Applications]
Loading Settings File: [C:/Users/mate/AppData/Roaming/RadeonDeveloperPanel/settings.ini]
        Loading settings group: [ConnectionWidget]
Starting new Router [Local]
        Successfully initialized a developer mode message bus with client: 6220
        ddModuleLoader's Module API Version 1.18.0 | DevToolsRouter's Module API Version 1.18.0
        DevToolsRouter's Module Version 0.1.0
        Connecting module DevToolsRouter
        Successfully created a connection context
        ddModuleLoader's Module API Version 1.18.0 | SystemInfoRouter's Module API Version 1.18.0
        SystemInfoRouter's Module Version 0.1.0
        Connecting module SystemInfoRouter
        Successfully created a connection context
Attempting to connect to ddTool [Local]
Connecting to Local
        ddNet local connection succeeded
ddEventCallback: EVENT_BUS_CONNECT
Loading settings group: [Blocklist]
Received initial halted message from client with id 2720! (OpenCL-Cpp-Reduction-v2.exe)
Successfully connected to client with id 2720 via driver control
ddEventCallback: EVENT_CLIENT_CONNECT
The file name for the process was C:\Users\mate\Source\Repos\GPGPU1\.vscode\build\msbuild-msvc-v143-computecpp\OpenCL\bin\Release\OpenCL-Cpp-Reduction-v2.exe
Application profile failed to bind module: DeviceClocks
Successfully bound profile instance with client [OpenCL-Cpp-Reduction-v2.exe]
Initializing client with id 2720
Advanced driver state (Platform Init -> Device Init) on client with id 2720
Advanced driver state (Device Init -> Post Device Init) on client with id 2720
ddEventCallback: EVENT_CLIENT_INITIALIZE
Successfully resumed driver on client with id 2720
[RGP] End capture profile
TRANSFER END  0 ((null):0, (null))
[RGP] Failed to capture profile.
[RGP] Failed to finish executing profile with code: 0
ddEventCallback: EVENT_CLIENT_INSTANCE_RELEASE
Saving settings group: [Managed Applications]
Client with id 2720 disconnected
MattGuerrette commented 1 year ago

@MathiasMagnus If this issue is still reproducing, could you please share your settings.ini file?

It should be located at:

%AppData%\RadeonDeveloperPanel\settings.ini

MathiasMagnus commented 1 year ago

The issue still persists.

settings.ini

[General]
rgp_path=.\\RadeonGPUProfiler.exe
rmv_path=.\\RadeonMemoryVisualizer.exe
auto_open_traces=false
rra_path=.\\RadeonRaytracingAnalyzer.exe
selected_workflow=Profiling
filter_api=OpenCL

[Workflows]
Entries\1\name=Profiling
Entries\1\immutable=true
Entries\1\data="@ByteArray({\"DataVersion\":{\"Major\":0,\"Minor\":1,\"Patch\":0},\"Modules\":[{\"ModuleName\":\"Profiling\",\"IsEnabled\":true,\"Data\":\"{\\\"SerializedDataHeader\\\":{\\\"ModuleName\\\":\\\"Profiling\\\",\\\"DataVersion\\\":{\\\"Major\\\":0,\\\"Minor\\\":1,\\\"Patch\\\":0}},\\\"ModuleData\\\":{\\\"SerializedProfilingData\\\":[{\\\"name\\\":\\\"Enable Instruction Tracing\\\",\\\"id\\\":0,\\\"type\\\":\\\"Boolean\\\",\\\"value\\\":false},{\\\"name\\\":\\\"Instruction tracing API PSO Hash\\\",\\\"id\\\":1,\\\"type\\\":\\\"64-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Number of Preparation Frames\\\",\\\"id\\\":2,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":4},{\\\"name\\\":\\\"Shader Engine Instruction Trace Mask\\\",\\\"id\\\":3,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Enable Streaming Performance Counters\\\",\\\"id\\\":4,\\\"type\\\":\\\"Boolean\\\",\\\"value\\\":true},{\\\"name\\\":\\\"SPM Sample Frequency\\\",\\\"id\\\":5,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":4096},{\\\"name\\\":\\\"SPM Memory Limit (MB)\\\",\\\"id\\\":6,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":128},{\\\"name\\\":\\\"SQTT Memory Limit (MB)\\\",\\\"id\\\":7,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":75},{\\\"name\\\":\\\"Trigger Mode\\\",\\\"id\\\":8,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":5},{\\\"name\\\":\\\"Trigger Marker Begin\\\",\\\"id\\\":9,\\\"type\\\":\\\"String\\\",\\\"value\\\":\\\"\\\"},{\\\"name\\\":\\\"Trigger Marker End\\\",\\\"id\\\":10,\\\"type\\\":\\\"String\\\",\\\"value\\\":\\\"\\\"},{\\\"name\\\":\\\"Trigger Tag Begin\\\",\\\"id\\\":11,\\\"type\\\":\\\"64-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Trigger Tag End\\\",\\\"id\\\":12,\\\"type\\\":\\\"64-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Trigger Frame Index\\\",\\\"id\\\":13,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Trigger Dispatch Start Index\\\",\\\"id\\\":14,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":1},{\\\"name\\\":\\\"Trigger Dispatch Stop Index\\\",\\\"id\\\":15,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":10}]},\\\"UserdataNodes\\\":[{\\\"NodeName\\\":\\\"ProfilingUserData\\\",\\\"UserdataStr\\\":\\\"{\\\\n    \\\\\\\"compute_auto_capture_time_ms\\\\\\\": null,\\\\n    \\\\\\\"dispatch_count\\\\\\\": 10,\\\\n    \\\\\\\"frame_trigger\\\\\\\": false,\\\\n    \\\\\\\"opencl_auto_trigger\\\\\\\": 1,\\\\n    \\\\\\\"output_path\\\\\\\": \\\\\\\"C:\\\\\\\\\\\\\\\\Users\\\\\\\\\\\\\\\\mate\\\\\\\\\\\\\\\\Documents\\\\\\\\\\\\\\\\rgp_profiles\\\\\\\\\\\\\\\\$(APP_NAME)\\\\\\\",\\\\n    \\\\\\\"sqtt_buffer_profile_index\\\\\\\": 2\\\\n}\\\\n\\\"}]}\"},{\"ModuleName\":\"DeviceClocks\",\"IsEnabled\":true,\"Data\":\"{\\\"SerializedDataHeader\\\":{\\\"ModuleName\\\":\\\"DeviceClocks\\\",\\\"DataVersion\\\":{\\\"Major\\\":0,\\\"Minor\\\":1,\\\"Patch\\\":0}},\\\"ModuleData\\\":{}}\"}]})"
Entries\2\name=Memory Trace
Entries\2\immutable=true
Entries\2\data="@ByteArray({\"DataVersion\":{\"Major\":0,\"Minor\":1,\"Patch\":0},\"Modules\":[{\"ModuleName\":\"MemoryTrace\",\"IsEnabled\":false,\"Data\":\"{\\\"SerializedDataHeader\\\":{\\\"ModuleName\\\":\\\"MemoryTrace\\\",\\\"DataVersion\\\":{\\\"Major\\\":0,\\\"Minor\\\":1,\\\"Patch\\\":0}},\\\"ModuleData\\\":{},\\\"UserdataNodes\\\":[{\\\"NodeName\\\":\\\"MemoryTraceUserData\\\",\\\"UserdataStr\\\":\\\"{\\\\\\\"output_path\\\\\\\":\\\\\\\"C:\\\\\\\\\\\\\\\\Users\\\\\\\\\\\\\\\\mate\\\\\\\\\\\\\\\\Documents\\\\\\\\\\\\\\\\rmv_traces\\\\\\\\\\\\\\\\$(APP_NAME)\\\\\\\"}\\\"}]}\"},{\"ModuleName\":\"DeviceClocks\",\"IsEnabled\":false,\"Data\":\"{\\\"SerializedDataHeader\\\":{\\\"ModuleName\\\":\\\"DeviceClocks\\\",\\\"DataVersion\\\":{\\\"Major\\\":0,\\\"Minor\\\":1,\\\"Patch\\\":0}},\\\"ModuleData\\\":{}}\"}]})"
Entries\3\name=Raytracing
Entries\3\immutable=true
Entries\3\data="@ByteArray({\"DataVersion\":{\"Major\":0,\"Minor\":1,\"Patch\":0},\"Modules\":[{\"ModuleName\":\"UberTrace\",\"IsEnabled\":false,\"Data\":\"{\\\"SerializedDataHeader\\\":{\\\"ModuleName\\\":\\\"UberTrace\\\",\\\"DataVersion\\\":{\\\"Major\\\":0,\\\"Minor\\\":1,\\\"Patch\\\":0}},\\\"ModuleData\\\":{},\\\"UserdataNodes\\\":[{\\\"NodeName\\\":\\\"RaytracingUserData\\\",\\\"UserdataStr\\\":\\\"{\\\\n    \\\\\\\"output_path\\\\\\\": \\\\\\\"C:\\\\\\\\\\\\\\\\Users\\\\\\\\\\\\\\\\mate\\\\\\\\\\\\\\\\Documents\\\\\\\\\\\\\\\\rra_scenes\\\\\\\\\\\\\\\\$(APP_NAME)\\\\\\\"\\\\n}\\\\n\\\\u0000\\\"}]}\"},{\"ModuleName\":\"DeviceClocks\",\"IsEnabled\":false,\"Data\":\"{\\\"SerializedDataHeader\\\":{\\\"ModuleName\\\":\\\"DeviceClocks\\\",\\\"DataVersion\\\":{\\\"Major\\\":0,\\\"Minor\\\":1,\\\"Patch\\\":0}},\\\"ModuleData\\\":{}}\"}]})"
Entries\size=4
Entries\4\name=All
Entries\4\immutable=true
Entries\4\data="@ByteArray({\"DataVersion\":{\"Major\":0,\"Minor\":1,\"Patch\":0},\"Modules\":[{\"ModuleName\":\"MemoryTrace\",\"IsEnabled\":true,\"Data\":\"{\\\"SerializedDataHeader\\\":{\\\"ModuleName\\\":\\\"MemoryTrace\\\",\\\"DataVersion\\\":{\\\"Major\\\":0,\\\"Minor\\\":1,\\\"Patch\\\":0}},\\\"ModuleData\\\":{},\\\"UserdataNodes\\\":[{\\\"NodeName\\\":\\\"MemoryTraceUserData\\\",\\\"UserdataStr\\\":\\\"{\\\\\\\"output_path\\\\\\\":\\\\\\\"C:\\\\\\\\\\\\\\\\Users\\\\\\\\\\\\\\\\mate\\\\\\\\\\\\\\\\Documents\\\\\\\\\\\\\\\\rmv_traces\\\\\\\\\\\\\\\\$(APP_NAME)\\\\\\\"}\\\"}]}\"},{\"ModuleName\":\"Profiling\",\"IsEnabled\":true,\"Data\":\"{\\\"SerializedDataHeader\\\":{\\\"ModuleName\\\":\\\"Profiling\\\",\\\"DataVersion\\\":{\\\"Major\\\":0,\\\"Minor\\\":1,\\\"Patch\\\":0}},\\\"ModuleData\\\":{\\\"SerializedProfilingData\\\":[{\\\"name\\\":\\\"Enable Instruction Tracing\\\",\\\"id\\\":0,\\\"type\\\":\\\"Boolean\\\",\\\"value\\\":false},{\\\"name\\\":\\\"Instruction tracing API PSO Hash\\\",\\\"id\\\":1,\\\"type\\\":\\\"64-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Number of Preparation Frames\\\",\\\"id\\\":2,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":4},{\\\"name\\\":\\\"Shader Engine Instruction Trace Mask\\\",\\\"id\\\":3,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Enable Streaming Performance Counters\\\",\\\"id\\\":4,\\\"type\\\":\\\"Boolean\\\",\\\"value\\\":false},{\\\"name\\\":\\\"SPM Sample Frequency\\\",\\\"id\\\":5,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":4096},{\\\"name\\\":\\\"SPM Memory Limit (MB)\\\",\\\"id\\\":6,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":128},{\\\"name\\\":\\\"SQTT Memory Limit (MB)\\\",\\\"id\\\":7,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":75},{\\\"name\\\":\\\"Trigger Mode\\\",\\\"id\\\":8,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":5},{\\\"name\\\":\\\"Trigger Marker Begin\\\",\\\"id\\\":9,\\\"type\\\":\\\"String\\\",\\\"value\\\":\\\"\\\"},{\\\"name\\\":\\\"Trigger Marker End\\\",\\\"id\\\":10,\\\"type\\\":\\\"String\\\",\\\"value\\\":\\\"\\\"},{\\\"name\\\":\\\"Trigger Tag Begin\\\",\\\"id\\\":11,\\\"type\\\":\\\"64-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Trigger Tag End\\\",\\\"id\\\":12,\\\"type\\\":\\\"64-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Trigger Frame Index\\\",\\\"id\\\":13,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Trigger Dispatch Start Index\\\",\\\"id\\\":14,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":0},{\\\"name\\\":\\\"Trigger Dispatch Stop Index\\\",\\\"id\\\":15,\\\"type\\\":\\\"32-bit Unsigned Integer\\\",\\\"value\\\":10}]},\\\"UserdataNodes\\\":[{\\\"NodeName\\\":\\\"ProfilingUserData\\\",\\\"UserdataStr\\\":\\\"{\\\\n    \\\\\\\"compute_auto_capture_time_ms\\\\\\\": null,\\\\n    \\\\\\\"dispatch_count\\\\\\\": 10,\\\\n    \\\\\\\"frame_trigger\\\\\\\": false,\\\\n    \\\\\\\"opencl_auto_trigger\\\\\\\": 0,\\\\n    \\\\\\\"output_path\\\\\\\": \\\\\\\"C:\\\\\\\\\\\\\\\\Users\\\\\\\\\\\\\\\\mate\\\\\\\\\\\\\\\\Documents\\\\\\\\\\\\\\\\rgp_profiles\\\\\\\\\\\\\\\\$(APP_NAME)\\\\\\\",\\\\n    \\\\\\\"sqtt_buffer_profile_index\\\\\\\": 2\\\\n}\\\\n\\\\u0000\\\"}]}\"},{\"ModuleName\":\"DeviceClocks\",\"IsEnabled\":true,\"Data\":\"{\\\"SerializedDataHeader\\\":{\\\"ModuleName\\\":\\\"DeviceClocks\\\",\\\"DataVersion\\\":{\\\"Major\\\":0,\\\"Minor\\\":1,\\\"Patch\\\":0}},\\\"ModuleData\\\":{}}\"},{\"ModuleName\":\"UberTrace\",\"IsEnabled\":false,\"Data\":\"{\\\"SerializedDataHeader\\\":{\\\"ModuleName\\\":\\\"UberTrace\\\",\\\"DataVersion\\\":{\\\"Major\\\":0,\\\"Minor\\\":1,\\\"Patch\\\":0}},\\\"ModuleData\\\":{},\\\"UserdataNodes\\\":[{\\\"NodeName\\\":\\\"RaytracingUserData\\\",\\\"UserdataStr\\\":\\\"{\\\\n    \\\\\\\"output_path\\\\\\\": \\\\\\\"C:\\\\\\\\\\\\\\\\Users\\\\\\\\\\\\\\\\mate\\\\\\\\\\\\\\\\Documents\\\\\\\\\\\\\\\\rra_scenes\\\\\\\\\\\\\\\\$(APP_NAME)\\\\\\\"\\\\n}\\\\n\\\\u0000\\\"}]}\"}]})"

[ConnectionWidget]
saved=true
historySize=1
lastUsedIndex=0
history\0\type=local
EnableAutoConnectToLastConnection=true
DisableClientTimeout=false

[mainwindow]
geometry=@ByteArray(\x1\xd9\xd0\xcb\0\x3\0\0\0\0\x2v\0\0\0\b\0\0\at\0\0\x3t\0\0\x2w\0\0\0.\0\0\as\0\0\x3s\0\0\0\0\0\0\0\0\a\x80\0\0\x2w\0\0\0.\0\0\as\0\0\x3s)
state=@ByteArray(\0\0\0\xff\0\0\0\0\xfd\0\0\0\0\0\0\x4\xfd\0\0\x3-\0\0\0\x4\0\0\0\x4\0\0\0\b\0\0\0\b\xfc\0\0\0\0)

[GlobalShortcuts]
Shortcuts\size=1
Shortcuts\1\id=1
Shortcuts\1\sequence=201326659
Shortcuts\1\nativeKey=67

[Managed%20Applications]
applications\size=3
applications\1\name=OpenCL-Cpp-Reduction-v2.exe
applications\1\api=4
applications\1\workflow=Profiling
applications\2\name=HIP-Reduction-v2.exe
applications\2\api=9
applications\2\workflow=Profiling
applications\3\name=OpenCL-Cpp-Reduction-v2.exe
applications\3\api=0
applications\3\workflow=Profiling
MathiasMagnus commented 1 year ago

@MattGuerrette I'm open for screenshared debugging if that helps (resident of Hungary).

mguerret-amd commented 1 year ago

@MathiasMagnus Same here, could you provide a sample application which reproduces this issue?

mguerret-amd commented 1 year ago

Also, does your compute application run for enough time to finish capture? If there is no sleep, or there are too few dispatches there may not be enough time for the capture to complete before your application exits. If this is the case, try adding a sleep for 5-10s before application exit.

MathiasMagnus commented 1 year ago

@MattGuerrette Apologies for my tone, bad day. I'm mostly writing examples and uni material, (occasionally performance tune larger scale applications). It's nothing fancy, bog standard SAXPY and reduciton kernels at the moment. Think SAXPY from the OpenCL-SDK. Indeed, the application is 2 seconds from start to finish, sometimes just one dispatch, sometimes 3, sometimes 2000 (OpenCL-OpenGL interop samples). Reducing the dispatch range to [0;1) and launching the same kernel 50 times captured the first dispatch just fine and I can open the .rgp file to inspect. 🎉 Even with a 5 second sleep after platform init and before leaving main, if I don't launch my kernel 50 times (just once), then it's not captured. How am I supposed to profile singular kernel launches?

How should one configure the workflows so that it captures everything as long as the application is running, may that be 1 second or 2 minutes? A (literal) workflow akin to rocprof on Linux. I want to show students how to read perf counters and trace their apps, see that if they change something in the kernel, it improves, but it's not always GROMACS that I measure, sometimes just experimental code with handful of dispatches.

Can't the profiler truncate the profiling session if the application cleanly exits before the prescribed maximum number of dispatches are encountered? (When it's configured to collect [10;100) and the app terminates before reaching 10 dispatches, I can understand it not collecting anything, but I don't understand why the app terminating "prematurely" is blocking trace collection. If the profiling/tracing service needs time to initialize, then the OpenCL ICD while initializing should query whether there's a profiling session in place and not return while it's not setup.)

MathiasMagnus commented 1 year ago

@MattGuerrette Is there anything I can help with? Code, input, design considerations, anything?