fix for stoichastic kernel dispatch selection

skyreflectedinmirrors commented 1 year ago

Fixes an issue when trying to select a specific --dispatch <N> on runs w/ 'stoichastic' dispatches (e.g., you don't have the same number of launches of a specific kernel between runs), e.g. from our stochastic test code (I think this was originally Stan's? Thanks Stan!)

#include <hip/hip_runtime.h>
#include <stdio.h>

__global__ void test_kernel(void)
{
  printf("Hello there\n");
}

int main()
{
  FILE *fp;
  fp = fopen("memory.txt", "a+");
  int count = 0;
  for (char c = getc(fp); c != EOF; c = getc(fp))
    if (c == '\n') // Increment count if this character is newline
      count = count + 1;

  fprintf(fp, "new line \n");
  fclose(fp);
  printf("Running kernel %d times\n", count);

  for (int i=0; i < count; i++)
    {
      test_kernel<<<1,1,0>>>();
    }
  hipDeviceSynchronize();
  return 0;
}

The current behavior is:

$ hipcc -O3 test.cpp
$ omniperf profile -n test --no-roof -- ./a.out
<...>
$ omniperf analyze --dispatch 1 -p workloads/test/mi200/
<...>
Traceback (most recent call last):
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf", line 917, in <module>
    main()
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf", line 897, in main
    analyze(args)
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/omniperf_analyze.py", line 282, in analyze
    run_cli(args, runs)
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/omniperf_analyze.py", line 201, in run_cli
    parser.load_table_data(
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/utils/parser.py", line 725, in load_table_data
    eval_metric(
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/utils/parser.py", line 508, in eval_metric
    ammolite__build_in[key] = eval(compile(s, "<string>", "eval"))
  File "<string>", line 2, in <module>
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/utils/parser.py", line 154, in to_int
    return int(a)
ValueError: cannot convert float NaN to integer

The problem is that when there is an uneven # of dispatches in the data-frame between runs, one (or more) indices gets marked as 'NaN' due to the merge between runs:

(Pdb) p ret_df[schema.pmc_perf_file_prefix]["Index"].astype(str)
0      0.0
1      1.0
2      2.0
3      3.0
4      4.0
      ... 
74    74.0
75    75.0
76    76.0
77    77.0
**78     nan**

Then, when we do the "convert to string and see if that index is in the user's string" test, dispatch 10 will fail, because "10.0" != "10". Thus by the time we get to eval_metrics, we have an empty dataframe (aside: @coleramos425 -- we should handle that more gracefully, e.g., check if the frame is empty, and spit out a helpful warning message)

The fix here is to simply convert the user's dispatch index to an integer, and use that to index into the data-frame directly.

skyreflectedinmirrors commented 1 year ago

Hmm, this doesn't fully solve the issue for the most pathological case (e.g., the example I linked), which will go back to the old breakage for dispatch >= 5, which starts to look extremely degenerate:

But, it works for the "sometimes I launch this kernel 3 times, sometimes 4" type of codes. Probably what I was referring to was that https://github.com/AMDResearch/omniperf/blob/cc5bba19f4ed60310371ca2cea0980f461614f9b/src/omniperf_analyze/utils/parser.py#L757 is insufficient. We'll have to think about better ways to detect this

coleramos425 commented 1 year ago

Thanks for the addition, @arghdos . We'll get this merged.

this doesn't fully solve the issue for the most pathological case (e.g., the example I linked), which will go back to the old breakage for dispatch >= 5

I wasn't able to reproduce this breakage after checking out the PR. My understanding is that timestamp columns (i.e. BeginNs and EndNs) shouldn't have any NaN's. Since your test code launches +1 kernels each run, the timestamp replacement should insert timestamps for each dispatch. Am I missing something?

*Note: For a truly random case like below, this could trigger a reproducer since the timestamp replacement run could generate fewer dispatches than the target replacement file, pmc_perf.csv

int  my_rand = 1 + (rand() % 100)
for (int i = 0; i < my_rand; i++){
   test_kernel<<<1,1,0>>>();
}
hipDeviceSynchronize();

skyreflectedinmirrors commented 1 year ago

[AMD Official Use Only - General]

I might have collected the data w/ my fork (which is slightly stale in the joining stuff). If you don't see any NaN's in the timestamps, then I think we should be OK

From: Cole Ramos @.> Sent: Monday, September 11, 2023 1:17 PM To: AMDResearch/omniperf @.> Cc: Curtis, Nicholas @.>; Mention @.> Subject: Re: [AMDResearch/omniperf] fix for stoichastic kernel dispatch selection (PR #183)

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

Thanks for the addition, @arghdoshttps://github.com/arghdos . We'll get this merged.

this doesn't fully solve the issue for the most pathological case (e.g., the example I linked), which will go back to the old breakage for dispatch >= 5

I wasn't able to reproduce this breakage after checking out the PR. My understanding is that timestamp columns (i.e. BeginNs and EndNs) shouldn't have any NaN's. Since your test code launches +1 kernels each run, the timestamp replacement should insert timestamps for each dispatch. Am I missing something?

*Note: For a truly random case like below, this could trigger a reproducer since the timestamp replacement run could generate fewer dispatches than the target replacement file, pmc_perf.csv

int my_rand = 1 + (rand() % 100) for (int i = 0; i < my_rand; i++){ test_kernel<<<1,1,0>>>(); } hipDeviceSynchronize();

Anything I'm missing here?

— Reply to this email directly, view it on GitHubhttps://github.com/AMDResearch/omniperf/pull/183#issuecomment-1714285216, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABRKDCJI4BJBCXVVJAMV2RTXZ5BQ3ANCNFSM6AAAAAA4TNDVLI. You are receiving this because you were mentioned.Message ID: @.***>

ROCm / rocprofiler-compute

fix for stoichastic kernel dispatch selection #183