Select particle filter components at runtime

nahueespinosa commented 1 year ago

Description

When writing a configurable ROS node like beluga_amcl, it is necessary to provide a mechanism to instantiate a particle filter with different sensor and motion models at runtime (as opposed to compile time).

What I had in mind was:

The ciabatta library supports providing an interface for the mixin, so we could define a pure abstract class representing a particle filter interface to inherit from.
We can create a template factory function that takes parameter sets (e.g., LikelihoodFieldModelParam, DifferentialDriveModelParam, etc.) and instantiates a particle filter with the correct model types. From there, we can rely on normal C++ runtime polymorphism.
Lastly, using ROS parameters, we can select which parameter types to instantiate to pass to our factory function.

Definition of done

[x] Implementation of the necessary mechanisms in the core library to support runtime selection of components.
[x] Usage of these structures to select different sensor and motion models in the AMCL node.

nahueespinosa commented 1 year ago

Thoughts? @glpuga @ivanpauno @hidmic

ivanpauno commented 1 year ago

The ciabatta library supports providing an interface for the mixin, so we could define a pure abstract class representing a particle filter interface to inherit from.

I would like instead of that, to have abstract interfaces for a motion model, sensor model, etc. What ciabatta provides doesn't exactly seem to help on that end. I think that the nicer thing of having separate interface is that you need one isolated factory for each thing, in the case of using ciabatta interfaces you still have a combinatory explosion.

I'm not sure about the performance hit though, but if we measure in a benchmark and the performance hit of using separated interfaces isn't big, I would go that way. To avoid the performance hit, we may need to "move" sample() to the motion model and importance_sample() to the sensor model, in case the compiler can optimize that correctly as-is (maybe not move, but have one method in the motion/sensor model that receives a range of particles). I think that's going to scale much better.

I can provide an example of how to implement this, we could either use an external library (dyno or similar) or implement it by hand.

BTW, I also think we should also reorganize a bit how parameters are read in beluga_amcl. For example, different motion models may require different parameters to be used. Having a huge list with everything in the node may not be a great idea.

I haven't pictured this part exactly in my mind, so I would need to write some code to see how it looks.

nahueespinosa commented 1 year ago

What ciabatta provides doesn't exactly seem to help on that end.

ciabatta solves the part where any particle filter can be referenced with a pointer to an abstract interface. The construction problem is solved thanks to a template factory that takes parameter types and deduces the corresponding model type. No combinatory explosion.

maybe not move, but have one method in the motion/sensor model that receives a range of particles

We can probably think of ways to provide that method without the user having to write it in custom models.

Having a huge list with everything in the node may not be a great idea.

We aimed to match the interface of nav2, which re-uses some parameters in different models. But +1 to more organization.

hidmic commented 1 year ago

The construction problem is solved thanks to a template factory that takes parameter types and deduces the corresponding model type. No combinatory explosion.

Hmm, I may be mistaken, but the mapping from parameter values, available in runtime (e.g. robot_model_type), to argument types, available in compile time (MotionModel, following the example) is still a problem. You can hide the combinatorial growth (e.g. currying types though a statically polymorphic builder), but compilation time and binary size will see an increase.

That said, the combinatory explosion is a bit of a thought experiment. How many combinations do we actually have? How many optimizations do we give up on by going down the runtime polymorphism route? In the ROS 2 world, type erasure is pretty much the norm (with shared library based plugins closely following). I would make a pause and check before jumping on that wagon.

nahueespinosa commented 1 year ago

Hmm, I may be mistaken, but the mapping from parameter values, available in runtime (e.g. robot_model_type), to argument types, available in compile time (MotionModel, following the example) is still a problem.

You're right. Although, I think that can be solved by storing a sensor model variant and a motion model variant, and using visit to call the appropriate static polymorphic builder.

You can hide the combinatorial growth (e.g. currying types though a statically polymorphic builder), but compilation time and binary size will see an increase.

Fair point.

How many optimizations do we give up on by going down the runtime polymorphism route?

We can probably implement something and benchmark if we really want to go down the type erasure route.

hidmic commented 1 year ago

We can probably implement something and benchmark if we really want to go down the type erasure route.

Just to be clear, my gut tells me we shouldn't unless compilation times and/or binary sizes become prohibitive.

glpuga commented 1 year ago

Hmm, I may be mistaken, but the mapping from parameter values, available in runtime (e.g. robot_model_type), to argument types, available in compile time (MotionModel, following the example) is still a problem.

+1

That said, the combinatory explosion is a bit of a thought experiment. How many combinations do we actually have?

Not many for the amcl_node feature set, but we should keep in mind that we want to eventually provide more features. It's an open question if by "more options" we'll eventually provide them in the same amcl_node, or if we'll propose the user a collection of modules and a builder to construct their own node and leave amcl_node like a legacy solution that does not need to be able to have the full feature set.

In the first case, whatever mechanism we use to activate functionality in runtime needs to scale; in the second case the combinations are that many and cheaper solutions may do the job.

We can probably implement something and benchmark if we really want to go down the type erasure route.

Are there any critical execution code sequences that needs to repeatedly move across mixins? my current understanding is that performance-critical code is contained within mixins, and that code that will potentially need to be "routed" across mixins/modules represents a very small part of execution time because it's only "invocation" code that is only executed once per iteration (e.g. update_motion()). If that's the case I would not weight performance to much on the selection of a mechanism.

Another thing to keep in mind is the nature of different moving pieces in the node:

different motion models are mutually exclusive
so are different perceptions models
but resampling policies (much smaller than the previous two) are composable: you may have more than one active at the same time, and the same may be true for other things in the future.

ivanpauno commented 1 year ago

If that's the case I would not weight performance to much on the selection of a mechanism.

+1, I would go the cleanest route, and if we find issues in benchmarks then take a step back and find how to avoid the issue.

hidmic commented 1 year ago

I'd like @nahueespinosa to weigh in, but:

It's an open question if by "more options" we'll eventually provide them in the same amcl_node, or if we'll propose the user a collection of modules and a builder to construct their own node and leave amcl_node like a legacy solution that does not need to be able to have the full feature set.

I think beluga's current template heavy design is naturally inclined towards the second scenario, which doesn't preclude having a collection of nodes built and optimized for specific applications. Having an almighty plug-n-play node is good for adoption, not so much for production.

my current understanding is that performance-critical code is contained within mixins, and that code that will potentially need to be "routed" across mixins/modules represents a very small part of execution time because it's only "invocation" code that is only executed once per iteration

Hmm, I'm not so sure about that. Several mixin methods are lazy invoked in loops through ranges (e.g. here). AFAIK compilers can't optimize across vtable trampolines. Also, indirection in mixin composition goes away with cache locality (all mixin state, while private, is contiguous in memory layout) unless we go and find a way to decouple behavior and state, and re-aggregate it later.

if we find issues in benchmarks then take a step back and find how to avoid the issue.

I don't know. It sounds like a non trivial amount of work, risking performance, just to satisfy some notion of clean code and solve a scalability problem it's not clear we have. That time could also be spent adding features :smiley:.

nahueespinosa commented 1 year ago

Right, I think we shouldn't be aiming for the almighty plug-n-play node, as we'll be compromising everywhere because of that without such a clear benefit.

We do have performance-critical code that depends on compile-time optimizations in calls to mixin methods. Even if we try to isolate the motion/sensor models by having methods that receive ranges of particles, type erasure would make you define interfaces to receive those ranges, and that is not trivial unless we're willing to really throw away performance (see ericniebler/range-v3#714).

I also think that even if we decide to go with a compile-time-inclined solution, that would also imply a good amount of work. So this is worth giving some thought to, even if it takes some time to decide.

nahueespinosa commented 1 year ago

A summary of the current discussion from my point of view:

The original problem is that we want to select different sensors and motion models at runtime in configurable ROS nodes. Potentially this could be extended to other mixins like resampling policies.
- Different motion models are mutually exclusive and so are different sensor models, but resampling policies are composable.
- Ideally we would want the final solution to be both scalable (to the number of variants) and performant.
- With respect to the scalability issue:
- We don't have too many model combinations in the current amcl feature set, but the combinatory explosion might be a concern if we want to provide more features in the same amcl_node. On that note, having an almighty plug-n-play node is good for adoption, not so much for production.
- We can also leave amcl_node like a legacy solution and provide the user a collection of nodes built for specific applications and a builder to construct their own node. Today's architecture naturally leans towards this scenario.
- With respect to the performance issue:
- The particularity of the motion and sensor models is that they have methods that are called lazily through loops in ranges, and it is critical to performance that the compiler can see through those calls and optimize them (inlining, vectorization, etc.).

Some comments on possible mechanisms that can be used in the final solution:

Create an abstract interface for the particle filter using ciabatta's "provides".
- This approach pretends to manage a common particle filter base class that can be constructed with a static polymorphic builder to hide the combinatorial growth of multiple models.
- The limitation is that part of the interface is based on ranges that depend on the concrete mixin types (i.e. auto particles(), auto weights()) and it also has template methods that can take any view (i.e. reinitialize(View input)).
- We could type erase the whole interface, but that would incur a performance penalty for users of the filter (for example, external "estimation" methods that want to process particle ranges). It could be argued that those method can be provided by internal mixins.
Create abstract interfaces for the sensor and motion models.
- We only need one isolated factory for each model avoiding the combinatorial explosion at compile-time.
- To avoid the performance hit of type erasure with ranges we could have one method in the motion/sensor model that receives a range of particles. That range of particles might need to be type erased if the interface is abstract, so more investigation is needed.

nahueespinosa commented 1 year ago

Here is my best shot so far:

Type erase the particle filter but keep the interface types by providing a concrete instance we want to use as reference.
- This still relies on a static polymorphic builder that can take the sensor parameters and instantiate a particle filter of the correct type.
- Then it uses a template type-erased wrapper that extracts the type information for the interface from the concrete reference type passed as template argument.

auto select_param = [](std::string_view model) -> std::variant<ModelParam1, ModelParam2, ModelParam3> {
    if (model == "model1") {
        return ModelParam1{};
    } else if (model == "model2") {
        return ModelParam2{};
    } else if (model == "model3") {
        return ModelParam3{};
    }
    throw std::runtime_error("Invalid model");
};

auto particle_filter = std::visit(
    [](auto&& ...params) {
        return ParticleFilterLike<SomeConcreteParticleFilterType>{make_particle_filter(params...)};
    },
    select_param("model1"),
    select_param("model2"));

A toy example to showcase the mechanisms: https://godbolt.org/z/z61WeKbax

hidmic commented 1 year ago

TIL std::visit can expand a combinatorial sequence. +1 to the above. It is much simpler than what I had in mind, at the expense of a reasonable concession ie. single argument mixin constructors with 1-to-1 matching types. The overhead of the type erased facade should be minimal as indirections are outside hot paths.

One thought. The Base type on which ParticleFilterLike operates need not be concrete. If we start using ciabatta providers to define mixin contracts, ParticleFilterLike could take a ParticleFilter that is purely abstract (instead of manually recreating the API in a dummy type). In fact, those same providers could be used to deduce facade mixins.

glpuga commented 1 year ago

I was also coming here to revisit addressing runtime selection issue in filter construction time. On my side, my argument was that for the legacy amcl-like node we don't have that many alternatives, given that 2 out of 3 policies are fixed, and the only things that can be replaced are motion model (2 options), sensor (2, 3 options?) and having or not having the additional SelectiverResample in the mix. Exhaustive enumeration of options may not be too bad in that case, as long as we don't extend the feature set more.

https://godbolt.org/z/razzcGa15

I still dislike the exhaustive enumeration of the combinations to build the registry near the end, which I suspect we can do automagically somehow.

glpuga commented 1 year ago

I'm liking @nahueespinosa solution more and more by the minute. A small differential I toyed with to see what it looks like if we have both motion models and sensor models to pick from: https://godbolt.org/z/vfG8sMYEK

glpuga commented 1 year ago

Here's a rehash blatantly stealing @nahueespinosa 's code to solve my combinatorial issue:

https://godbolt.org/z/rxsnYjsh7

nahueespinosa commented 1 year ago

Excellent! It seems we have reached a consensus that using std::visit and a factory function or class to solve combinatorial problems is acceptable.

One thought. The Base type on which ParticleFilterLike operates need not be concrete. If we start using ciabatta providers to define mixin contracts, ParticleFilterLike could take a ParticleFilter that is purely abstract (instead of manually recreating the API in a dummy type).

That's an interesting point. I tried to make something like this work but there is a challenge in defining an abstract interface when return types depend on the concrete type of the filter (i.e. auto particles() { return views::all(particles_); }).

In fact, those same providers could be used to deduce facade mixins.

I think you're onto something here. Love to hear more though. Are you talking about something like this? https://godbolt.org/z/PqPP7KaP5

Here's a rehash blatantly stealing @nahueespinosa 's code to solve my combinatorial issue.

Great! TIL about __PRETTY_FUNCTION__. Same as my response above, the difficult part there is to define an abstract interface that depends in some cases on the concrete type of the filter.

For reference, here is how we use a particle filter in beluga amcl:

// Trivial interface
particle_filter_->update_motion(odom_to_base_transform);
particle_filter_->update_sensor(beluga_amcl::utils::make_points_from_laser_scan(...));
particle_filter_->sample(exec_policy);
particle_filter_->importance_sample(exec_policy);
particle_filter_->resample();

// Range return type interface
particle_filter_->particles().size());
ranges::transform(particle_filter_->particles(), ...);
beluga::estimate(particle_filter_->states());  // This could be an internal mixin: particle_fitler_->estimate()

// Range parameter type interface
particle_filter_->reinitialize(ranges::views::generate(...));  // We could use a type erased view as input since we will not use this very often

I think we can consider redefining the interface so I'm taking suggestions.

glpuga commented 1 year ago

I won't be able to look into how we use these in detail, but some initial thoughts:

The ones in "trivial interface" are the key ones. These should not have too much trouble to define an abstract interface, except for the two that get the exec_policy because those were templates, if I remember correctly.

Regarding he ones in the other two sections,

I agree that the estimation of the pose and covariance is better relocated as a mixin; it's a fundamental algorithm of the filter, to extract the estimations from the cloud, and indeed there might be multiple ways to do it that can be configured.
The current number of particles in the cloud is high level operational info that can be trivially provided by method in a mixin (BootstrapParticleFilter is a good candidate, since that's the owner of the data structure) instead of reaching out directly into the range to measure the size.
Uses of the particles() range interface are a problem, but only if we care about performance.
- The use of that is publishing the particle cloud, but I've been thinking that that needs not be a high performance data channel because the particle cloud is basically debug data.
- A production system has not good uses for that data, we should not even publish that if we have no subscribers for it.
- Even when we do publish it, we publish at a very low publish rate.
- That being the case, we can provide a function that returns a vector of particle state data in an expensive-to-create container by value, and impact should still be minimal. This can trivially be added to the abstract interface.

Do we have other uses of particles() other than the particle cloud size count and and publication that can not be dismissed?

post edit: the contents of reinitialize_with_pose() are also low level operations on the filter that I think are better suited to be done by an estimation mixin or some other part of the filter.

nahueespinosa commented 1 year ago

Do we have other uses of particles() other than the particle cloud size count and and publication that can not be dismissed?

I don't think so.

I agree with your comments. Sounds like cleaning up the interface and defining an abstract base class for the filter is very much possible.

hidmic commented 1 year ago

Are you talking about something like this? https://godbolt.org/z/PqPP7KaP5

I had mixin interfaces (e.g. to define what a motion model is) and facade implementations (e.g. to provide the type erased APIs) in mind, but I was a bit fuzzy on the code. Your sample is a better realization. In fact, vtable indirection should be optimized away for method calls in contexts where the type is statically known. And while it is a bit odd at first for mixins to not be explicitly bound to an interface, I like the extra flexibility of allowing several mixins implement a single idea. :shipit:

That being the case, we can provide a function that returns a vector of particle state data in an expensive-to-create container by value, and impact should still be minimal. This can trivially be added to the abstract interface.

I fully agree with @glpuga. Relevant hot paths need not use the abstracted interface to access particle data.

ivanpauno commented 1 year ago

Sorry for the delay!

Excellent! It seems we have reached a consensus that using std::visit and a factory function or class to solve combinatorial problems is acceptable.

+1, that looks good to me.

Are you talking about something like this? https://godbolt.org/z/PqPP7KaP5

It seems confusing to me that you're not choosing between Model1, Model2 while all implement the same apply_motion(). What's the idea of that?

nahueespinosa commented 1 year ago

What's the idea of that?

Oh none really, it was just meant to be a quick implementation to show the idea of the composite interface.

ivanpauno commented 1 year ago

Oh none really, it was just meant to be a quick implementation to show the idea of the composite interface.

Ah perfect! The example looks good, I double checked that because I didn't know if I was missing something.

glpuga commented 1 year ago

:tada:

Ekumen-OS / beluga

Select particle filter components at runtime #118

Description

Definition of done