argonne-lcf / THAPI

A tracing infrastructure for heterogeneous computing applications.
Other
22 stars 9 forks source link

modernize zetracer #203

Open TApplencourt opened 5 months ago

TApplencourt commented 5 months ago
TApplencourt commented 5 months ago

OOM

The clean-up code in case of allocation failure seems to "greedy". For example, in _get_profiling_event

  ze_event_desc_t e_desc = {ZE_STRUCTURE_TYPE_EVENT_DESC, NULL, 0, ZE_EVENT_SCOPE_FLAG_HOST, ZE_EVENT_SCOPE_FLAG_HOST};
  res = ZE_EVENT_CREATE_PTR(e_w->event_pool, &e_desc, &e_w->event);
  if (res != ZE_RESULT_SUCCESS) {
    THAPI_DBGLOG("zeEventCreate failed with %d, for event pool: %p, command list: %p, context: %p", res, e_w->event_pool, command_list, context);
    goto cleanup_ep;
  }
  goto cleanup;
cleanup_ep:
  ZE_EVENT_POOL_DESTROY_PTR(e_w->event_pool);

Maybe we create the pool to profile other event, it seem a little hard to remove the full pools.

Cleanup

On _on_destroy_context not clear too me why we have a loop on _ze_events and then on _ze_event_pools. My understanding is that _ze_event_pools keep a link of ze_events.

The code seem to imply that we can destroy some event who are on _ze_events but not on _ze_event_pools