Refactor particle to handle different types

Summary

Refactor Particle data to have the generic functions with the ability to fetch and modify generic scalar and vector properties, map, and interpolate these properties to nodes. Add a separate implementation functions to operate on these properties of the particle. These functions could be grouped under the namespace of solid particle functions, fluid-particle function, and mixed particle functions. The particle agnostic design of functions would allow for mix-and-matching different functions in the Solver class, without having to deal with multiple-inheritance or repetition.

Motivation

Refactoring and separating the Particle data from the implementation of algorithm enables handling different particle data types, without having to have a lot of virtual functions in the ParticleBase and other subsequent derived particles. Addresses the comments raised in RFCs #633 and #634. For example, deriving FluidParticle from Particle would result in extra data needed for FluidParticle, which is not needed in the case of Particle conversely, FluidParticle won't need certain Particle data. However, the important consideration is that the functions of a FluidParticles have to be virtualized in the abstract ParticleBase and every time a new class is derived, more abstract functions have to be added to the base and other existing derived classes. These functions, which are not useful, must, therefore, throw an error. The Abstract Factory design with inconsistent derived functions is inherently prone to errors when a developer fails to add functions to all the derived classes. This RFC proposes an alternate way of deriving particle data and keeping the implementation separate and reusable. Separation of implementation allows for the composition of different functions in the Solver class.

Design Detail

Particle class has a generic ordered map of scalar and vector properties. An open addressing scheme will be used to store the map properties, that the data stored in here will remain continuous in memory. These properties can be accessed using scalar_property and vector_property. Particle types will still control the initialization of these different properties, i.e., derived classes will be used to create new types of Particles, but all these classes will have a consistent functional interface. This avoids the problem of too many virtual functions. The ParticleBase will be altered to have the following. A consistent interface allows transferring particles across MPI ranks.

// Scalar properties
tsl::ordered_map<std::string, double> scalar_properties_;
tsl::ordered_map<std::string, Eigen::Matrix<double, Tdim, 1>> vector_properties_;

Scalar and vector properties can be set and returned using generic functions.

void assign_property(const std::string& property, double value);
double property(const std::string& property) const;

Nodes will also have a similar property pool structure, which will allow us to write generic map and interpolation functions.

//! Map scalar properties to nodes
template <unsigned Tdim>
void mpm::Particle<Tdim>::map_scalar_properties_to_nodes(const std::string& property) noexcept {
  auto value = scalar_properties_.at("property");
  // Check if scalar property is not max
  assert(value != std::numeric_limits<double>::max());

  // Map mass and momentum to nodal property taking into account the material id
  for (unsigned i = 0; i < nodes_.size(); ++i) {
    value = value * shapefn_[i];
    nodes_[i]->update_property(true, property, value);
  }
}

This property pool structure provides a generic interface to assign and get properties. We can now define a collection of functions under a namespace, such as particle::fluid to define functions that operate on fluid properties. The initialization of the property pool for the Particle will be done by the derived Particle class. Each particle type implementation will have access to the property pool data.

  // Assign projection parameter
  template <unsigned Tdim>
  void mpm::particle::fluid::projection_parameter(std::shared_ptr<mpm::ParticleBase<Tdim>> particle, double value) {
    particle->assign_scalar_property("projection", value);
  }

In the custom solver for the fluid particle, we'll have:

// Iterate over all the particles and assign a projection parameter
double value = 2.4;
mesh->iterate_over_particles(
       [&value](std::shared_ptr<mpm::ParticleBase<Dim>> ptr) {
          return mpm::particle::fluid::projection_parameter<Dim>(ptr, value);
 });

To operate on a specific set of particle types, we could use iterate_over_particles_with_predicate functionality that can filter particles with a particular type.

With this approach, a solver class can build on different types of Particle data and functions pooled from multiple namespaces.

The proposal is to start moving new Particle types to use this design, and the existing solid particle will be refactored in phases.

Drawbacks

The implementation operating on the derived data types is separated from the actual data, which is good in some ways, but also it might be seen as breaking encapsulation. Considering the fact that if something owns a Particle type, then it should be able to modify the data in the Particle. This separation of data and implementation may give some developers a false feeling that their implementation could work on all Particle types, however, as soon as the code is run or compiled with debug mode, it will be known that the data type is unsupported and cannot be used. The phase refactoring of Solid Particles will break a lot of existing code, however, it will minimize the number of update_nodes and interpolate_from_nodes design.

Rationale and Alternatives

Advantages

Allows for future composition in the solver class with multiple Particle types, multiple particle::fluid functions can be defined and used in different solver classes.
Avoids complex inheritance schemes, for example, TwoPhaseParticle doesn't have to redefine functions in fluid or solid when derived from only one type.

Other designs considered

FactoryDesign (Abstract Class): This results in too many virtual functions, an addition of a new function in one of the derived classes affects all existing Particle classes.
BridgeDesign: Requires a class for the implementation that takes the ParticleData, same as Abstract class, results in virtual functions pushed to implementation.
std::variant: Good alternative to removing abstract base classes. However, may not be a good alternative, as we still need to separate the functions from the data. Also, different template instantiation is not possible.
Super functions: Use more generic super functions which are public member functions and specialized functions for each derived type are private member functions. This design avoids non-pure virtual functions, however, limits the interface functions so it is not flexible and writing test functions are difficult.

What is the impact of not doing this?

Consistent interface cannot be achieved on different particles and will have to refactor existing classes every time a new function is added.
Consistent interface helps with MPI data transfer.

Prior Art

Most large codes avoid a lot of derived classes and Abstract class design with inconsistent interfaces across derived classes.

Dealii uses an extreme case of all properties being double and provide particle handle. Dealii Particle

Uintah also uses a generic datawarehouse to store properties.

No Paradigm programming

Free your functions

Unresolved questions

What parts of the design do you expect to resolve through the RFC process before this gets merged?

New Particle types will follow this new design, while the existing solid particle has to be slowly refactored.

What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?

The refactoring of solver class to make it more composable will be discussed in a separate RFC.

Unresolved questions

How to handle MPI data transfer with an arbitrary number of particle data types? Since the ParticleBase will be derived for each type of Particle, we will have enough information to transfer particles. Suggest creating serialization and deserialization schemes. Additionally, since the number of active Particle types are going to be a handful at most, we can iterate through different types when doing MPI halo exchange.

Still undecided if we need to derive a new Particle for each type, so the advantage of this method is that the derived Particle type knows its variables and is easy for instantiation. However, it creates additional vtables. On the other hand, if we move the variable instantiation outside of the Particle, we don't have to derive new types, but initialization and MPI transfer may become harder?

Changelog

[Edit #1]: Update iterate_over_mesh function and formatting. [Edit #2]: Add std::variant as alternatives and prior art [Edit #3]: Unresolved question on deriving new Particle types [Edit #4]: Discussion on superfunctions as an alternative

Thanks @kks32. I think the plan on doing that for two-phase first is good, and then refactor the current single-phase MPM slowly. You can assign different tasks to us actually, and I think it would be a good exercise to have someone new like Joel to be part of this refactoring.

I do have a few comments/questions on the design:

Previously, we need to have a new Particle derived class every time we have a new solver. I guess with this, we will construct many solvers by pulling different functions from particle::solid, particle::fluid and (one day) particle::gas. Is this the spirit of the new design?
What about state variables? Would we store them within the double ordered map?

I will drop comments as they come. I will be thinking about this too.

* Previously, we need to have a new `Particle ` derived class every time we have a new solver. I guess with this, we will construct many solvers by pulling different functions from `particle::solid`, `particle::fluid` and (one day) `particle::gas`. Is this the spirit of the new design?

* What about state variables? Would we store them within the `double` ordered map?

Same way as we store an unordered_map right now, it's a change in data structure, won't affect the design as a whole.

I'm not sure if I have understood correctly of all the design. But if we define the functions for different phases in different namespace of Particle class, then it means there will be no new deriving Particle class from ParticleBase, what's the meaning of the current Particle class? Why don't we define all things in ParticleBase straightforward?

@tianchiTJ yes, that's correct. I'm still undecided if we need to derive datatypes or we could initialize and transfer Particles without having derived classes, this would require knowing the size. If we can achieve serialization then there is no need for deriving particles, and we can move everything to Particle class and remove the base class altogther.

Shall we keep the current ParticleBase-Particle-TwoPhaseParticle(or other deriving type) structure, and only put the variables storage, initialization functions and some other necessary functions (like the compute_mass, compute_update_position), then move all other specific function into a new class, like ParticleFunction, then create different namespace into the ParticleFunction for different type particles, like SolidParticleFunction, FluidParticleFunction.

Hi @tianchiTJ what would be the reasoning behind deriving the types, because the only difference would be the instantiation functions. We can't really have fixed data types in the derived class and all the variables would be in scalar_properties_ or vector_properties_

I just think not change the current structure too much, but you are right, maybe it's not necessary to use the deriving types. Anyway, could we decide it as soon as possible? Or we can not work on the implementation of some important features. @kks32 @bodhinandach

I'd like to go ahead with this design. Let's put to a vote, if you are happy add a :+1: to this comment, if not :-1: .

Thanks, @kks32 for the RFC. I some comments (or questions) for the proposed RFC:

Will stresses and strains will be stored in vector_properties_too? The type is tsl::ordered_map<std::string, Eigen::Matrix<double, Tdim, 1>> so it wont fit.
The way you update the property in the node: nodes_[i]->update_property(true, property, value) means that we will not use phase id in node anymore?
I see that this proposal is generally good to treat mapping, assignment, and get functions of different fluid, solid, and mixture properties. How about other functions only useful for fluid/two-phase solvers? For instance, in two-phase in we might have to update intermediate liquid velocity with function: bool compute_updated_liquid_velocity(double dt); How to treat such function?
I am interested to know what do you think about the particle transfer:

How to handle MPI data transfer with an arbitrary number of particle data types? Since the ParticleBase will be derived for each type of Particle, we will have enough information to transfer particles. Suggest creating serialization and deserialization schemes. Additionally, since the number of active Particle types are going to be a handful at most, we can iterate through different types when doing MPI halo exchange.

Any further thoughts on this?

Can we have similar mapping functions from particle to cell for elemental matrices? We need that.

I think I agree with @tianchiTJ that we need to minimize the changes as much as possible. To my knowledge, the current RFC only fixes the function derivations for common processes in particles. But for special functions, I am still not sure how this will help.

Will stresses and strains will be stored in vector_properties_too? The type is tsl::ordered_map<std::string, Eigen::Matrix<double, Tdim, 1>> so it wont fit.

No, stresses and strains will be there as usual. Nothing is changing in the base particle class, we'll slowly refactor only scalar and vector properties.

The way you update the property in the node: nodes_[i]->update_property(true, property, value) means that we will not use phase id in node anymore?

Thanks, we'll pass an additional argument of phase to update_property.

I see that this proposal is generally good to treat mapping, assignment, and get functions of different fluid, solid, and mixture properties. How about other functions only useful for fluid/two-phase solvers? For instance, in two-phase in we might have to update intermediate liquid velocity with function: bool compute_updated_liquid_velocity(double dt); How to treat such function?

I can't see why this will be any different from assigning the projection parameter example shown in the RFC. We define a function that accepts a particle pointer and can set and get velocities.

I am interested to know what do you think about the particle transfer: How to handle MPI data transfer with an arbitrary number of particle data types? Since the ParticleBase will be derived for each type of Particle, we will have enough information to transfer particles. Suggest creating serialization and deserialization schemes. Additionally, since the number of active Particle types is going to be a handful at most, we can iterate through different types when doing MPI halo exchange. Any further thoughts on this?

At the moment the idea is to serialize the object data and transfer each particle type separately.

Can we have similar mapping functions from particle to cell for elemental matrices? We need that.

The particle class has a pointer to the cell, so I don't see why not. If we need to store element matrices, it might be better to do something similar to the contact algorithm with a common nodal properties pool.

I think I agree with @tianchiTJ that we need to minimize the changes as much as possible. To my knowledge, the current RFC only fixes the function derivations for common processes in particles. But for special functions, I am still not sure how this will help.

Particle will be a data container, so any function can be defined that updates the particle properties. So no function will be special. There will be no derived classes.

@kks32 @tianchiTJ These are the functions collected from Particle, FluidParticle, and TwoPhaseParticle. I made the categories, but please modify and add or remove it accordingly.

@kks32 some functions listed here are not available in the current develop branch, so if you can't find it, please ignore them as Tianchi and I will organize them according to our PR.

@tianchiTJ Can you check and list things that are not available here?

Some comments:

Many functions are derived or override from the functions listed in General Functions, mostly in TwoPhase particles. The lists below only contain distinct functions.
Please ignore the pure and non-pure virtual functions indication, I just copied them from the header files.

General Functions:

virtual bool initialise_particle(const HDF5Particle& particle) = 0;
virtual bool initialise_particle(const HDF5Particle& particle, const std::shared_ptr<Material<Tdim>>& material) = 0;
virtual bool assign_material_state_vars(const mpm::dense_map& state_vars, const std::shared_ptr<mpm::Material<Tdim>>& material) = 0;
virtual double state_variable(const std::string& var) const = 0;
virtual HDF5Particle hdf5() const = 0; (This should be different for TwoPhaseParticle)
Index id() const { return id_; }
void assign_coordinates(const VectorDim& coord) { coordinates_ = coord; }
VectorDim coordinates() const { return coordinates_; }
virtual bool compute_reference_location() = 0;
virtual VectorDim reference_location() const = 0;
virtual bool assign_cell(const std::shared_ptr<Cell<Tdim>>& cellptr) = 0;
virtual bool assign_cell_xi(const std::shared_ptr<Cell<Tdim>>& cellptr, const Eigen::Matrix<double, Tdim, 1>& xi) = 0;
virtual bool assign_cell_id(Index id) = 0;
virtual Index cell_id() const = 0;
virtual bool cell_ptr() const = 0;
virtual void remove_cell() = 0;
virtual void compute_shapefn() noexcept = 0;
void assign_status(bool status) { status_ = status; }
bool status() const { return status_; }
virtual Eigen::VectorXd tensor_data(const std::string& property) = 0;
virtual void append_material_id_to_nodes() const = 0;
virtual void initialise() = 0;
virtual unsigned nneighbours() const = 0;
virtual void assign_neighbours(const std::vector<mpm::Index>& neighbours) = 0;
virtual std::vector<mpm::Index> neighbours() const = 0;

Common Properties (Geometric, Kinematics & Material):

virtual bool assign_volume(double volume) = 0;
virtual double volume() const = 0;
virtual double diameter() const = 0; (I implemented this for free surface detection)
virtual VectorDim natural_size() const = 0;
virtual void compute_volume() noexcept = 0;
virtual void update_volume() noexcept = 0;
virtual double mass_density() const = 0;
virtual void compute_mass() noexcept = 0;
virtual bool assign_material(const std::shared_ptr<Material<Tdim>>& material) = 0;
unsigned material_id() const { return material_id_; }
virtual void assign_mass(double mass) = 0;
virtual double mass() const = 0;
virtual double pressure() const = 0; ()
virtual void compute_strain(double dt) noexcept = 0;
virtual Eigen::Matrix<double, 6, 1> strain() const = 0;
virtual Eigen::Matrix<double, 6, 1> strain_rate() const = 0;
virtual double volumetric_strain_centroid() const = 0;
virtual double dvolumetric_strain() const = 0;
virtual void initial_stress(const Eigen::Matrix<double, 6, 1>& stress) = 0;
virtual void compute_stress() noexcept = 0;
virtual Eigen::Matrix<double, 6, 1> stress() const = 0;
virtual bool assign_velocity(const VectorDim& velocity) = 0;
virtual VectorDim velocity() const = 0;
virtual VectorDim displacement() const = 0;
virtual void compute_updated_position(double dt, bool velocity_update = false) noexcept = 0;

Mapping Functions:

virtual void map_mass_momentum_to_nodes() noexcept = 0;
virtual void map_multimaterial_mass_momentum_to_nodes() noexcept = 0;
virtual void map_body_force(const VectorDim& pgravity) noexcept = 0;
virtual void map_internal_force() noexcept = 0;
virtual bool map_pressure_to_nodes() noexcept = 0;
virtual bool compute_pressure_smoothing() noexcept = 0; (Mapping back and forth)
virtual void map_traction_force() noexcept = 0;

Particle Constraints & BC

virtual bool assign_traction(unsigned direction, double traction) = 0;
virtual VectorDim traction() const = 0;
virtual void apply_particle_velocity_constraints(unsigned dir, double velocity) = 0;
virtual void assign_free_surface(bool free_surface) = 0;
virtual bool free_surface() = 0;
virtual bool compute_free_surface() = 0;
virtual void assign_normal(const VectorDim& normal) = 0;
virtual VectorDim normal() = 0;

Fluid:

virtual void assign_projection_parameter(double parameter) {}; (also in Incompressible and 2P)
virtual void initial_pressure(double pressure) {}; (also in Incompressible and 2P)
virtual bool compute_updated_pressure() {}; (also in Incompressible and 2P)
virtual void assign_pressure(double pressure) (also in Incompressible and 2P)
virtual bool assign_particle_pressure_constraint(); (also in Incompressible and 2P)
virtual void compute_turbulent_stress() {};

Mapping Functions (to cell):

virtual bool map_laplacian_to_cell() {}; (also in Incompressible and 2P)
virtual bool map_poisson_right_to_cell() {}; (also in Incompressible and 2P)
virtual bool map_correction_matrix_to_cell() {}; (also in Incompressible and 2P)

TwoPhase:

virtual void assign_projection_parameter(double parameter) {}; (also in Incompressible and Fluid)
virtual void initial_pressure(double pressure) {}; (also in Incompressible and Fluid, for 2P this is to update pore_pressure)
virtual bool compute_updated_pressure() {}; (also in Incompressible and Fluid, for 2P this is to update pore_pressure)
virtual bool assign_particle_pore_pressure_constraint(); (also in Incompressible and Fluid, for 2P this is to update pore_pressure)
virtual void assign_pressure(double pressure) (also in Incompressible and Fluid, for 2P this is to update pore_pressure)
virtual double pore_pressure()
virtual void initialise_liquid_phase()
virtual bool assign_liquid_material(const std::shared_ptr<Material<Tdim>>& material)
virtual bool assign_liquid_traction(unsigned direction, double traction)
virtual VectorDim liquid_traction()
virtual VectorDim mixture_traction()
virtual double liquid_mass()
virtual bool assign_liquid_velocity(const VectorDim& velocity)
virtual VectorDim liquid_velocity()
virtual Eigen::Matrix<double, 6, 1> liquid_strain()
virtual bool update_permeability()
virtual bool assign_particle_liquid_velocity_constraint(unsigned dir, double velocity)
virtual void apply_particle_liquid_velocity_constraints()
bool assign_porosity();
bool update_porosity(double dt)

Mapping Functions (to cell):

virtual bool map_laplacian_to_cell() {}; (also in Incompressible and Fluid)
virtual bool map_poisson_right_to_cell() {}; (also in Incompressible and Fluid)
virtual bool map_correction_matrix_to_cell() {}; (also in Incompressible and Fluid)
virtual bool map_intermediate_matrix_to_cell() {};

Mapping Functions (to node):

virtual bool map_pore_pressure_to_nodes(double current_time = 0.)
virtual bool compute_pore_pressure_smoothing() (This should be the same with compute_pressure_smoothing but the node phase index is different)
virtual bool map_drag_force_coefficient()

@bodhinandach Thanks for grouping functions based on type. However, we want to group them by the order of execution in the solver, so we may be able to write bigger functions, which will be generic for all derived types. The exercise was to try to see if we could actually group these functions in an order they appear in the solver, so we don't end up having non-pure virtual functions or specific derived classes having extra public functions. Looking at MPMExplicit and Navier-Stokes solver I can't see a potential set of global particle functions than can then call local private functions in the derived type.

Additional context

@kks32 would like to avoid derived classes behaving like a superset to the base class or have virtual functions that are not defined in all the derived classes.

@bodhinandach suggested modifying this list of functions in different particle types into a superset of functions, which are available in all particle types, such as initialize, compute_stress.

This discussion and the listing of functions is to check if such a grouping of particle functions is feasible, leaving no virtual functions.

The benefit is potentially avoiding virtual functions and has safe overriding. Even if we assume this approach is feasible, hiding away all the functions as private would mean a smaller exposed interface, which will not be easy to extend and add functionality. It is also difficult to write tests because all these functions will be private. Testing would then require calling friend functions, which is not a good design.

After some more consideration, I am wondering about the possibility to derive particle types, but only keep the common particle functions within the class, and merge all independent/specialized functions to be outside the class. This will help to initialize the classes with the right types, and also in MPI transfer, as the type is known. We derived data of individual Particle types but keep specialized particle functions separately. Maybe the particle functions may have to stay in the same namespace.

This design was 30% slower than private variables so closing this RFC. See #654 for more details

cb-geo / mpm