Open gzagaris opened 4 years ago
@gzagaris I think it's best and most flexible to support both synchronous and asynchronous operations when it make sense. We should choose a default behavior for each operation in a consistent manner and allow users to choose otherwise via the axom::ASYNC/axom::SYNC options. Note that we will be rolling out the asynchronous execution stuff in RAJA soon that I spoke about in the ASQ Webex. This would help users overlap operations should the choose to do that.
@rhornung67 -- thanks for the feedback. I totally agree. We already allow that and I am currently employing it. My concern is that in the API it is not clear that the operation is asynchronous, unless the user is also familiar with BVH internals.
Do you want me to add this to the agenda for today's Axom meeting?
Do you want me to add this to the agenda for today's Axom meeting?
Sure -- I've been mulling this over and if folks have ideas/suggestions it will be helpful.
@rhornung67 -- This is a design question.
I assigned it to you to help us make a determination about the next steps.
We should wait for a use case. @publixsubfan suggested trying in a test case.
The BVH may be instantiated using asynchronous CUDA execution policy as follows:
Then constructing the BVH on the GPU can be accomplished as follows:
When using asynchronous execution the call to
bvh.build()
will return on the host, but, the GPU would still be constructing the BVH.Moreover, calling find after
bvh.build()
would be fine, since the kernels are currently launched on the same stream and will be executed in order.Asynchronous execution is typically used to:
Additional speedups of the order of 1.5X to 2X have been observed when using asynchronous execution over synchronous execution with the present implementation.
However, other than specifying an asynchronous execution policy, it is is not clear from the API that subsequent calls are asynchronous.
Considerations
That could be done by different methods, e.g., :
It could also be done by a template argument
And there are probably a couple of other ways to do this. We need to come up with a clean and precise API design for this.