BVH API Semantics when using asynchronous execution policies

gzagaris commented 4 years ago

The BVH may be instantiated using asynchronous CUDA execution policy as follows:

spin::BVH bvh< NDIMS, axom::CUDA_EXEC<BLOCK_SIZE,axom::ASYNC> >( aabbs, N );

Then constructing the BVH on the GPU can be accomplished as follows:

bvh.build();

When using asynchronous execution the call to bvh.build() will return on the host, but, the GPU would still be constructing the BVH.

Moreover, calling find after bvh.build() would be fine, since the kernels are currently launched on the same stream and will be executed in order.

Asynchronous execution is typically used to:

Hide latencies due to kernel launch overhead and avoid synchronizing after each kernel
Overlap execution on the host and on the GPU, for example:

// construct BVH on the GPU
spin::BVH bvh< NDIMS, axom::CUDA_EXEC<BLOCK_SIZE,axom::ASYNC> >( aabbs, N, pool_allocator );
bvh.build();

// while the BVH is being constructed on the GPU, pack buffers on the CPU
pack_buffers()

// call find
bvh.find( ... );

Additional speedups of the order of 1.5X to 2X have been observed when using asynchronous execution over synchronous execution with the present implementation.

However, other than specifying an asynchronous execution policy, it is is not clear from the API that subsequent calls are asynchronous.

Considerations

Do we want all subsequent calls/queries to the BVH that launch kernels to synchronize internally at the end if the policy is asynchronous? This could limit potential overlap of execution on the CPU as indicated in the example above.
Do we want to specify explicitly in the API that the method is running synchronously or asynchronously?

That could be done by different methods, e.g., :

bvh.ibuild(); // builds the BVH asynchronously on the GPU

// TODO: overlap execution on the GPU and GPU
do_stuff_on_cpu();

bvh.ifind();   // runs a find query on the GPU

// TODO: do more CPU stuff
do_more_stuff_on_the_cpu();

axom::synchronize();  // caller has to synchronize afterwards

It could also be done by a template argument

bvh.build< axom::SYNCH >( );

And there are probably a couple of other ways to do this. We need to come up with a clean and precise API design for this.

rhornung67 commented 4 years ago

@gzagaris I think it's best and most flexible to support both synchronous and asynchronous operations when it make sense. We should choose a default behavior for each operation in a consistent manner and allow users to choose otherwise via the axom::ASYNC/axom::SYNC options. Note that we will be rolling out the asynchronous execution stuff in RAJA soon that I spoke about in the ASQ Webex. This would help users overlap operations should the choose to do that.

gzagaris commented 4 years ago

@rhornung67 -- thanks for the feedback. I totally agree. We already allow that and I am currently employing it. My concern is that in the API it is not clear that the operation is asynchronous, unless the user is also familiar with BVH internals.

rhornung67 commented 4 years ago

Do you want me to add this to the agenda for today's Axom meeting?

gzagaris commented 4 years ago

Do you want me to add this to the agenda for today's Axom meeting?

Sure -- I've been mulling this over and if folks have ideas/suggestions it will be helpful.

kennyweiss commented 3 years ago

@rhornung67 -- This is a design question.
I assigned it to you to help us make a determination about the next steps.

rhornung67 commented 1 year ago

We should wait for a use case. @publixsubfan suggested trying in a test case.

LLNL / axom

BVH API Semantics when using asynchronous execution policies #232

Considerations