ROCm / rocprofiler-compute

Advanced Profiling and Analytics for AMD Hardware
https://rocm.docs.amd.com/projects/omniperf/en/latest/
MIT License
135 stars 49 forks source link

Only evaluate metrics in analyze for architectures we're actually going to use! #144

Closed skyreflectedinmirrors closed 1 year ago

skyreflectedinmirrors commented 1 year ago

We know that the python parsing of metrics has high overhead on each analysis / GUI run. However, I did not know that we are parsing&compiling metrics for all available gfx90X architectures each analyze run. I discovered this while modifying some of the yaml files, wherein I discovered that I had a syntax error in one of my modified config files. I fixed it in gfx90a only, to test the change, but was quite surprised when I kept hitting the same syntax error!

Backtracing in pdb lead me to:

https://github.com/AMDResearch/omniperf/blob/a346db7646b0a935f4cac51d131b4a585f065c05/src/omniperf_analyze/omniperf_analyze.py#L97

where-in I realized, we're compiling for all architectures:

p archConfigs.keys()
dict_keys(['gfx906', 'gfx908', 'gfx90a'])

A quick optimization (that will lower the overhead of analyze by ~3x) is to detect which architecture we're going to analyze (at the very least, we have access to the miXXX part of the --path the user passed in, but it's also in the sysinfo.csv file in said path), and only parse&compile metrics for that arch.

coleramos425 commented 1 year ago

Things are a bit more complicated since we support baseline comparison functionality in CLI. If I'm conducting an intra-soc comparison we'd need ArchConfigs to be set for each architecture, i.e.

$ omniperf analyze -p workloads/test_1/mi200/ -p workloads/test_2/mi100/

For now, I'm just detecting the "newest" architecture loaded in ArchConfigs and using this to form the baseline comparison data tables. This should work in the short term considering newer archs should contain a superset of all tables avail in older archs.

May want to revisit this in the "Omniperf Revamp"...

skyreflectedinmirrors commented 1 year ago

[AMD Official Use Only - General]

Really? I had no idea we supported that! Neat!

I think we could do the same sort of conditional "we see mi100 and mi200 in the workloads thing, so only eval those two"? But I am fuzzy on how all this mapping works


From: Cole Ramos @.> Sent: Wednesday, August 2, 2023 1:41 PM To: AMDResearch/omniperf @.> Cc: Curtis, Nicholas @.>; Author @.> Subject: Re: [AMDResearch/omniperf] Only evaluate metrics in analyze for architectures we're actually going to use! (Issue #144)

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

Things are a bit more complicated since we support baseline comparison functionality in CLI. If I'm conducting an intra-soc comparison we'd need ArchConfigs to be set for each architecture, i.e.

$ omniperf analyze -p workloads/test_1/mi200/ -p workloads/test_2/mi100/

For now, I'm just detecting the "newest" architecture loaded in ArchConfigs and using this to form the baseline comparison data tables. This should work in the short term considering newer archs should contain a superset of all tables avail in older archs.

May want to revisit this in the "Omniperf Revamp"https://github.com/AMDResearch/omniperf/discussions/153...

— Reply to this email directly, view it on GitHubhttps://github.com/AMDResearch/omniperf/issues/144#issuecomment-1662680935, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRKDCJVZY7N7XICTD3TK3DXTKGMRANCNFSM6AAAAAAZ2FIMFQ. You are receiving this because you authored the thread.Message ID: @.***>

coleramos425 commented 1 year ago

Merged into dev. Closing issue.