KhronosGroup / SYCL-Docs

SYCL Open Source Specification
Other
107 stars 67 forks source link

Library and event profiling #498

Open TApplencourt opened 7 months ago

TApplencourt commented 7 months ago

It's not really an issue; it's just an inconvenience that @colleeneb found when working with oneMKL. I opened this issue to track it so I don't forget, as it's a general inconvenience. We can close if we think it's not the correct place for such discussion.

The problem is that library API returns one event but can use internally multiple commands and hence events. As example, this is a typical case of some mkl functions. Example:

// Assuming in-order for simplicify
auto foo(sycl::queue Q)  { 
 auto e1 =bar(Q)
 auto e2 = bar(Q)
 return e2;
}

 foo.get_profiling_info() # This will not be e1+e2, just e2

This means it's impossible for a user of a library to know how much the library call took on the device.

Maybe one solution will be to add a new event constructor to an event that take of lists of the event to "bundle". Then we can specify get_profiling_info of this event should report the sum of the events used during the construction

Example:

 auto e1 = foo1(Q)
 auto e2 = foo1(Q)
 e3 = sycl::event{ e1, e2 };
 e3. get_profiling_info == (e1.get_profiling_info + e2.get_profiling_info) 
gmlueck commented 7 months ago

The event::get_profiling_info function returns a timestamp, not an elapsed time. Therefore, it's not simply a matter of adding time together from the aggregate events.

If we added an event constructor like you suggest, would you expect info::event_profiling::command_submit to return the submission timestamp of the first constituent event and info::event_profiling::command_end to return the completion timestamp of the last constituent event? This seems like it would be easy enough to implement, but subtracting the two timestamps would not always provide the amount of time that commands were executing on the device. For example, if e1 completed before e2 started, subtracting the timestamps would include the host time in between the two commands.

TApplencourt commented 7 months ago

You are totally correct on both points!

We can either:

AerialMantis commented 5 months ago

SYCL WG call:

keryell commented 5 months ago

The problem with this as a timing tool is it has an impact on performance. Not only for SYCL: https://github.com/NVIDIA/stdexec/pull/1227 And with chiplets, distributed firmware... it is unlikely to get better in the future.