Closed liangfok closed 8 years ago
@david-german-tri: This need is something to keep in mind when Designing Framework 2.0.
The ultimate hybrid continuous/discrete solver will support events, and the handlers for events can signal termination. This won't require much from System 2.0 but will be mostly Solver 2.0.
@sherm1: Just wondering how event support would help in this context. Will a separate thread need to be spawned that periodically checks the return value of ros::ok()
, and emit a signal to Solver 2.0 when the return value is false?
Or, will Solver 2.0 embrace ROS and use a custom SIGINT handler that plays nice with ROS as described here?
I don't think Solver 2.0 will have any particular relationship with ROS. Its job will be to accurately advance continuous and discrete state variables through time in accordance with a set of equations. The continuous part requires numerical integration with accuracy control and localization of events, at which point discrete variables are updated by event handlers (including simple discrete update equations). This is a mathematical procedure, not a software stack. During a discrete update, termination can be requested. That could come from anywhere, including ROS, but that would be invisible to the solver that I'm thinking about. I think you have a bigger picture in mind than I do; at the moment I'm thinking about how we formulate a hybrid dynamical system and solve it efficiently.
Sigh -- would likely be a short conversation in person with a whiteboard!
OK, this was actually the first time I've heard of the term "Solver 2.0" and I wasn't sure what it's relationship was with System 2.0. It sounds like Solver 2.0 is a simulator that supports both continuous and discrete state transitions, and that it will eventually be implemented using the software abstractions provided by System 2.0. Are these correct guesses?
Is "Solver 1.0" precisely the while
loop I mentioned in the original description of this issue?
I'm happy to chat over Google Hangout or Skype anytime.
this was actually the first time I've heard of the term "Solver 2.0"
That's cuz I just made it up during this discussion! I like the name though. And yeah, Solver 1.0 would be the current while loop with a hacked-together fixed step RK2 integrator. But the original Drake solver is the Simulink one which is a very nice model for Solver 2.0.
Please feel free to call me via hangout any time you want to chat -- no appointment needed! If I can't talk I'll call you back.
I'm not really sure how much the Solver really comes into play here, either 1.0 or 2.0. Really, we just want to have a simulate() method where the "health check" part of its main loop is customizable.
Right now, body of the current while loop prior to const double dt = ...
is doing a handle_realtime_factor
check with some related options handling. That should all just be ExtractMethod'd into a bool-valued keep_going functor, which then @liangfok could replace with his functor that calls ros::ok
. All in all, a relatively straightforward refactoring.
It would also be a good opportunity to better collect the handle_realtime
-related code into a more cohesive class with code in a C file (not the header).
@jwnimmer-tri: Great idea. I will do the refactor in the morning. Thanks!
I think you have a bigger picture in mind than I do; at the moment I'm thinking about how we formulate a hybrid dynamical system and solve it efficiently.
This is certainly possible. I am eager to scale up Drake to simulate massive systems like a city in which millions of autonomous vehicles, manually-controlled vehicles, pedestrians, bicyclists, personal robots, delivery robots, etc, peacefully coexist with each other. I suspect achieving high fidelity simulation of just a single autonomous vehicle will stretch the limits of a single PC's computational resources, and am thus hoping that the architecture we come up with can be easily distributed to make use of more resources. It would be great if we could design System 2.0 / Solver 2.0 / Drake 2.0 with scalability in mind. Maybe a piece of the puzzle is to embrace a component-based architecture like that provided by ROS 2.0.
I think the biggest challenge will be to achieve a distributed simulation while maintaining Drake's "glass box" (all state variables in a single vector) nature.
Initial related discussion here: https://github.com/RobotLocomotion/drake/blob/master/drake/systems/plants/RigidBodySystem.h#L9
@sherm1 I need to resolve this issue as a high priority item for https://github.com/RobotLocomotion/drake/issues/2606. Maybe I could offload this from you? What's the current status on this?
Maybe I can address this issue in the short term by incrementally refactoring the existing code and then you follow up with an event-based longer term solution in Solver 2.0?
OK, @liangfok. What do you think of this approach: add an optional bool std::function
to SimulateOptions
and then change the loop to while (t < tf && !options.should_stop()) {}
? The default value for should_stop
would be something like [](){return false;}
but you could set it to [](){return !ros::ok();}
.
Some design questions: what (if any) arguments should be passed to the function? Does there have to be some way to record the reason that the simulation was stopped?
Putting it in SimulationOptions
makes it more difficult to change later, compared to a separate function argument to simulate()
.
Putting it in SimulationOptions makes it more difficult to change later, compared to a separate function argument to simulate().
I don't see why -- please explain. Putting it in Options has the advantage of leaving the API unchanged, and Solver 2 will handle this differently.
Thanks @sherm1. +1 for adding a bool std::function
into SimulationOptions
that enables customizing how a simulation can be terminated.
What (if any) arguments should be passed to the function?
None for this initial refactor (I'm aiming to minimize PR size). In the future, I can imagine passing it a clock (to support arbitrary clock sources and enforcement of a desired real-time factor), the simulation state (to detect convergence or live lock), etc.
Does there have to be some way to record the reason that the simulation was stopped?
I don't think it's necessary to inform the calling process of why the simulation stopped. Distinguishing between stopping due to SIGINT
(ctrl+c
) or the final simulation time being reached is not necessary, at least for the current Solver 1.0 implementation and applications. Maybe Solver 2.0 can be designed to support this differentiation if there is such a need.
I don't see why -- please explain. Putting it in Options has the advantage of leaving the API unchanged, and Solver 2 will handle this differently.
As I said upthread weeks ago, a default argument can be used to keep the existing call sites unchanged. By making it an argument, we get function overloading, which allows for different overloads, e.g., depending on what types the callback functor wants.
I am OK making the functor
a new parameter of simulate(...)
. @sherm1, should I proceed to do that?
I don't understand the overload argument; there is only one actual simulation loop, which will require a particular method signature to call in the while
condition. That could be the signature stored in Options. I'm also wary of modifying the signature to add behavior tweaks since it isn't easily scalable if this is just one of many more features we'll want (likely, IMO). Options provides a reasonable loophole to collect a potentially bottomless pit of behavior tweaks.
However, I don't feel strongly about it so if you guys prefer the extra argument (with proper default) I can live with that.
Yeah, I think the initial time (ti
), final time (tf
), initial system state (xi
) should all go into SimulationOptions
. This would simplify the simulate()
method signature to be a nice and concise:
template <typename System>
void simulate(const System& sys, const SimulationOptions& options) {
// ...
}
This refactor probably shouldn't go into a single PR. I just wanted to know your thoughts about that.
I don't understand the overload argument; there is only one actual simulation loop, which will require a particular method signature to call in the while condition.
Okay, that's a good point. I retract my suggestion.
OK, thanks @jwnimmer-tri. @liangfok, time and state will be collected into the Context in System 2.0 (not options) so we probably don't want to mess with them here. But please go ahead with adding a termination function to SimulationOptions
.
Great! OK starting on a PR with just a new termination function in SimulationOptions
. Thanks!
Currently, the termination condition is when the simulation time
t
exceeds the final timetf
. The relevant source code is in drake/systems/Simulation.h#L118 and is replicated below:This is not sufficiently flexible when integrating Drake within a larger framework like ROS since ROS is multithreaded and installs its own SIGINT handler. When running Drake with ROS, typing
ctrl+c
will no longer shutdown the simulation. We need to design a mechanism for the abovewhile
loop to effectively implement the following logic:The ability to achieve the above logic will return the normal ability to terminate the simulation when
ctrl+c
is typed.Without this fix, the only way to shut down Drake + ROS is to type
ctr+\
, which is potentially dangerous since it does not shut things down cleanly (it may result in zombie processes that you will then need to pick off one-by-one using something likekill -9
).