PolyArch / gem-forge-framework

BSD 2-Clause "Simplified" License
21 stars 9 forks source link

How can I evaluate a new benchmark with SSP and Stream-float? #2

Closed K16DIABLO closed 2 years ago

K16DIABLO commented 2 years ago

Hello,

I want to run a simulation (e.g. vec_add) on your implementation. However, it seems that the framework cannot recognize the stream in this code as there is no numLoadElementsAllocated and numStoreElementsAllocated in simulation results. Below is the code and my question is, how can I make arrays (A, B, and C) as streams?

static const uint64_t file_size = 65536; 
//static const uint64_t file_size = 33554432; 
//static const uint64_t file_size = 16777216;

__attribute__((noinline)) static void vector_addition_host(Value* A, Value* B, Value* C) {
  #pragma omp parallel for schedule(static) firstprivate(A, B, C)
    for (uint64_t i = 0; i < file_size; i += 16) {
      __m512i valA = _mm512_loadu_epi32(A + i);
      __m512i valB = _mm512_loadu_epi32(B + i);
      __m512i valC = _mm512_add_epi32(valA, valB);
      _mm512_storeu_epi32(C + i, valC);
    }
}

int main(int argc, char **argv) {

  int numThreads = 1;
  if (argc == 2) {
    numThreads = atoi(argv[1]);
  }
  printf("Number of Threads: %d.\n", numThreads);
  omp_set_dynamic(0);
  omp_set_num_threads(numThreads);
  omp_set_schedule(omp_sched_static, 0);

  // Create an input file with arbitrary data.
  Value* A = (Value*) aligned_alloc(CACHE_LINE_SIZE, file_size * sizeof(Value));
  Value* B = (Value*) aligned_alloc(CACHE_LINE_SIZE, file_size * sizeof(Value));
  Value* C = (Value*) aligned_alloc(CACHE_LINE_SIZE, file_size * sizeof(Value));

#ifdef GEM_FORGE
  gf_detail_sim_start();
#endif

#ifdef WARM_CACHE
  WARM_UP_ARRAY(A, file_size);
  WARM_UP_ARRAY(B, file_size);
  WARM_UP_ARRAY(C, file_size);
  // Initialize the threads.
#pragma omp parallel for schedule(static) firstprivate(A)
  for (int tid = 0; tid < numThreads; ++tid) {
    volatile Value x = *A;
  }
#endif

#ifdef GEM_FORGE
  gf_reset_stats();
#endif

  vector_addition_host(A, B, C);

#ifdef GEM_FORGE
  gf_detail_sim_end();
  exit(0);
#endif

  free(A);
  free(B);
  free(C);

  return 0;
}

Thank you for your attention and I'm looking forward to your reply.

seanzw commented 2 years ago

My guess is that you need to set -gem-forge-roi-function={func_name} to tell the compiler to try to recognize streams in that function. Can you try that?

K16DIABLO commented 2 years ago

It works! The scheduling chunk of OpenMP hinders compiler to recognize streams. BTW, I have one more question about simulation result. Why there are two simulation results in stats.txt file? I'm not used to gem5, so I'm confused which is the right simulation result.

seanzw commented 2 years ago

Just use the first section of simulation results. This is the result when the program hit ROI_END and gem5 dumps it. The second section is the final result that gem5 dumps at the end of simulation. So:

  1. First section -- the ROI region you marked in your program.
  2. Second section -- from ROI begin to the end of simulation.
K16DIABLO commented 2 years ago

Thank you!