RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.36k stars 1.27k forks source link

proposal for batch (CPU/GPU) evaluation in the systems framework #16339

Open RussTedrake opened 2 years ago

RussTedrake commented 2 years ago

I took Eigen 3.4's tensor module for a quick spin to see what it might look like to support batch evaluation in the systems framework. Short story: I'm highly encouraged by it, but I think Eigen alone will not be enough. It has enough support for batch matrix multiplication, but anything more fancy (e.g. batch matrix inverse) requires more; probably grabbing tensorflow's c++ math library is going to be necessary.

Getting batch evaluations (including GPU support) into the systems framework seems totally plausible to me now, and not anywhere near as jarring as I had feared. (I'm not saying that getting efficient GPU SAP solvers or collision checkers is easy... that's a very different question!)

Basic proposal:

I believe that the back-and-forth between tensorflow::Tensor and Eigen::Tensoris easy (and inexpensive).

It would also be interesting/important to understand if Eigen::Map<rows,cols> could be efficient if the batch size is N=1. In that case, perhaps we could change the existing Context datatypes to Eigen::TensorFixedSize<N,rows,cols> and have e.g. get_continuous_state_vector() return the Eigen::Map?

dmsj commented 2 years ago

This is great! Are you imagining that EvalBodyPoseInWorld(BatchContext<T>, body) would have a multi-threaded implementation under the hood, or run single-threaded?

Besides using Drake as a back-end for batch evaluations, we have a related, but slightly different use case where we would like Drake-native thread-safety for asynchronous calls by multiple threads for motion planning.

Currently we create multiple contexts and use a thread pool to hand off requests to the contexts. We've hidden all of this behind our own thread-safe constraint evaluation method.

I'm happy to detail our approach further if it would be useful. What have other folks done to solve this problem?

If there's community interest, we'd be like to contribute to Drake-native parallelization. If this is too tangential, I'm happy to make another issue.

RussTedrake commented 2 years ago

If we use Eigen::Tensor, then the same MultibodyPlant code could dispatch to evaluate serially, in parallel threads, or to the GPU, based on the way that the tensors are allocated to be passed in and the hardware available.

Using multiple Contexts and a thread pool is exactly the right thing to do. A good example of this in Drake right now is the monte-carlo code: https://github.com/RobotLocomotion/drake/blob/5316536420413b51871ceb4b9c1f77aedd559f71/systems/analysis/monte_carlo.cc#L42