Psivant / stormm

32 stars 3 forks source link

STORMM logo

This code base will provide accessible, performant, interoperable libraries in a new family of MD programs. Basic operations such as coordinate and topology intake, user input parsing, and energy evaluations are managed through a common set of C++ classes and CUDA or HIP kernels. In many instances, HPC applications will be constructed using the same data arrays and layouts as their CPU counterparts, although the most efficient GPU code may require sacrifices in the convenience or performance of CPU code. These sacrifices are likely to be minor, however, as the best CPU code takes advantage of similar vectorization as GPU code and, even if vectorization cannot happen in the same manner, the contents and format of basic data arrays do not restrict the actual algorithm in any severe way.


Code Standards


Coding Conventions

    t  = x + y +
         z + w;
    t += (k * p) +
         (x / y) + 4;

Abstraction and Dependencies


Naming Conventions


Function Declarations

    /// \brief General description of function in all guises
    ///
    /// Overloaded:
    ///   - First version
    ///   - Second version
    ///
    /// \param (Describe all parameters to any of the function's overloaded versions)

Struct Declarations


Template Conventions

One of the most significant features that distinguishes C++ from the original C language is the existence of templates, objects or functions which are (within limits) agnostic to the data type that they will operate upon. Because they step into the realm of modern C++ and a new world of abstraction and inference, templates make an important "boundary" case of what can be sanctioned within the (arbitrary, self-imposed) STORMM coding conventions. The critical aspect of successful template design concerns type inference, a depthy and general aspect of C++ which made laudable improvements in C++11. When a template is used in the code, the compiler essentially creates an overloaded variant of the template function as stipulated either by an explicit type declaration, i.e. double t = foo<double>(... argument list ...) or by infering the exact overloaded type from a function argument of the template type, i.e. a primitive such as

    template <typename T> T foo(const std::vector<T> &list) {
      ... function content ...
    }

In the above case, the return type of foo() cannot be dictated by the type of variable that receives the output. However, it can be inferred from the argument fed to foo():

    std::vector<long long int> phone_numbers(1000, 0);
    ... load phone_numbers with data... 
    float local_var = foo(phone_numbers);

The template leads to an overloaded instantiation of the function that takes a vector of ints, and therefore returns an int, regardless of the type of local_var. This can be very powerful, but it can also lead to some mistakes that will break code later on. Let's say that foo() was designed to accept a vector of int4 objects:

    struct int4 {
      int x;
      int y;
      int z;
      int w;
    };

If there were any code inside of foo() that operated on components of the input data such as x, y, or w, this would negate the use of foo() on any data type that does not have those components. Similarly, if foo() contained an operation like += that was well defined for standard ints but not for int4s, its use would be diminished. All of these errors would be silent until the templated function was given a type to work on, at which point it would either compile or not. With this understanding:


References and Pointers

In C, it is common to pass arrays and other large objects by pointer. This saves the program from having to create a copy of the argument in question solely for the purposes of that function, at the expense of de-referencing the pointer at every access of the object or one of its attributes. This also requires frequent use of the -> operator, i.e. x->y for member variable y in struct x, if x has been passed by pointer (foo(*x)). In C++, it is more common to pass by reference (&x). There are some subtle differences between references and pointers, but from the standpoint of performance references and pointers are equivalent (the program still needs to de-reference the reference). Passing by reference also removes the need to write -> for accessing object attributes and also &x in the function call (it can be confusing, as to pass by reference there will be a &x in the function declaration). One can write x.y inside a function passed argument &x (although the de-referencing will still occur), and call foo(x). It is not best just to pass arguments by reference, however: whenever passing by reference, STORMM passes by const reference, restricting the function from changing the data referenced by the variable (the only thing that a function can produce is its return value). While some developers strive to have every argument be const (including arguments passed by value), there are situations where it is far preferable to have a function do two things, usually modifying some array while accumulating a result. In these cases, the best practice is to return the result and pass the variable which will be modified through the process by pointer. That way, in the function call, developers will need to write & when passing the argument, sending a clear signal that it, too, will be modified over the course of the call. For example:

(... Declare BuildingDesign and ConstructionProject objects ...)

//-------------------------------------------------------------------------------------------------
int squareFootage(const BuildingDesign &blueprint, std::vector<ConstructionProject> *buildings,
                  const std::vector<int> &addresses) {
  int sq_ft = 0;
  for (size_t i = 0; i < addresses->size(); i++) {
    buildings->data()[addresses[i]].floor_count = blueprint.floors;
    buildings->data()[addresses[i]].entry_dimensions = blueprint.door_width;
    buildings->data()[addresses[i]].length = blueprint.length;
    buildings->data()[addresses[i]].width = blueprint.width;
    sq_ft += (blueprint.floors * blueprint.length * blueprint.width) - (blueprint.door_width * 2);
  }

  return sq_ft;
}

//-------------------------------------------------------------------------------------------------
int main() {
  BuildingDesign home_blueprint1;
  BuildingDesign home_blueprint2;
  std::vector<ConstructionProject> my_town;
  std::vector<int> home_addresses1;
  std::vector<int> home_addresses2;
  (... Fill in the details of home_blueprint1, home_blueprint2, and home_addresses(1,2) ...)

  // The function call below will accumulate the total residential space while assigning specific
  // addresses to be a particular type of home.
  int total_residential_space  = squareFootage(home_blueprint1, &my_town, home_addresses1);
  total_residential_space     += squareFootage(home_blueprint2, &my_town, home_addresses2);
  return total_residential_space;
}

The program above passes blueprint objects and the arrays of home addresses by const reference, while passing the array of actual buildings in the town by pointer. In the main function, it is obvious that my_town is being modified as the details of particular buildings are filled out by calls to a function that also accumulates the square footage of buildings made with a certain blueprint. It would also be feasible to write the squareFootage() function with a non-const reference for its second argument, but then the calls in main would give no indication that my_town was being modified by each call: const and non-const references look the same when calling a function, but passing by pointer guards against stealth data modifications.

Do not be afraid of pointers! C++ is a superset of C, and therefore can make use of them. It is not desirable to use pointers to dynamically allocate memory, however: that is for containers and smart pointers, the most common being std::vector. For array access, pointers are often (but not always) less desirable than containers due to the fact that a pointer to a block of memory is just that: it carries no information about what the bounds are, and while it is possible to reach a segmentation fault by accessing a non-existent element of a std::vector it is much easier to go into the code and install a bounds check if that happens. In optimized code, accessing an element of a std::vector by the [] operator is as fast as accessing an element of an array through a pointer--the compiler knows what the program is meant to do, so pointers seldom have a performance advantage unless they help to skip over an extra layer of function calls and de-referencing (there are cases of that, and STORMM libraries frequently provide small structs full of const pointers to expedite access to data, especially in GPU-based arrays). However, for the specific purpose of letting a function modify one of its input arguments, pointers are the preferred route.