Hello BSSw Team,

One of the deliverables for my 2023 BSSw Fellowship is a blog post on the bssw.io site summarizing the topic of the fellowship - visually communicating elements of software design. Below is a prosed blog post for feedback.

Note that there are a couple of things I will change ASAP:

The diagrams are rendered on GitHub with Mermaid, but this may not be available on the BSSw site. I wanted to check with you, and if not I'll create images
I created this post using JupyterBook and used some features for placing the diagrams within the page. Since I've removed those, I'm wondering whether there are any Bootstrap-style html elements available within the BSSw site.

Thanks, Rafael

2023 BSSw Fellowship: Visually communicating elements of software design

I've been a researcher at the National Renewable Energy Laboratory for seven years, and my role squarely fits into the description of a research software engineer (RSE). In my time at the lab, I've noticed a pattern in funding and staffing cycles where both can be discontinuous or unpredictable resulting in lost momentum and institutional knowledge on software projects. While this pattern is likely inherent to research itself, RSE's can mitigate these impacts and improve the overall quality of their software by communicating elements of software design within the development workflow. As a 2023 Better Scientific Software Fellow, I've aggregated resources and developed training material to empower RSE's to visually communicate ideas and themes within their software projects, and the results are described here.

Documenting ideas, decisions, and institutional knowledge is a powerful way to mitigate discontinuous momentum during software development efforts. Early in the development of a software, requirements are identified, and some of them are adopted while others are intentionally rejected. The form and function of the software starts to take shape. Capturing these decisions is beneficial to future development efforts considering that the collective knowledge that the development team has in the moment will be different from it's knowledge at a future time. Given time constraints for software development in the research environment, the process of communicating design decisions can be easily relegated to that elusive "when there's time" moment. To manage this tendency, I suggest that project teams adopt graphical communication methods to describe conceptual ideas and their implementations using Unified Modeling Language (UML) diagrams and automated tooling. Narrative content around these diagrams is helpful and encouraged, but the diagrams often speak for themselves. Once the initial diagrams are in place, future development efforts can build on them to scope and design work while inherently communicating the impact to the entire system. This article describes UML, and it's role in the development workflow for research software engineers.

UML, Class Diagrams, and Sequence Diagrams

The Unified Modeling Language (UML) was created in the 1995 and adopted by the Object Management Group, a standards consortium, in 1997. In essence, UML is a set of graphical notations described by metamodels that enable describing and designing software systems. UML is particularly relevant to software developed in the object-oriented paradigm, but the methods and notations are broadly relevant to software engineering and systems engineering (see SysML). The notations defined in UML can be considered syntax for creating a specific set of diagrams useful in software design and analysis. While UML defines 14 types of diagrams, the following eight are particularly useful and the first two are described further:

Class diagrams are directly correlated to object-oriented programming. Attributes and methods on a class can be described with their visibility, argument types, and return type. Abstract classes and abstract methods are denoted in italics. Inheritance, aggregation, composition, and association are described with lines connecting classes and specific types of arrows.

Class

classDiagram
    class Class {
        Type attribute1
        Type attribute2
        + public_method(arg1, arg2)
        # protected_method() return_type
        - private_method()
    }

Abstract class

classDiagram
    class AbstractClass {
        <<Abstract>>
        Type static_attribute$
        abstract_method()* return_type
    }

Inheritance

classDiagram
    BaseClass1 <|-- DerivedFrom1and2
    BaseClass2 <|-- DerivedFrom1and2
    BaseClass2 <|-- DerivedFrom1

Aggregation and Composition

classDiagram
    Container o-- AggregationPart
    Container *-- CompositionPart

Association

classDiagram
    A -- B
    A --> B

Sequence diagrams are broadly applicable to systems when describing algorithms, processes, and procedures. The metamodel relates participants by passing messages (commands) and data between them. A rectangle on a participant's line indicated whether a portion is "on" or "off", and boxes encompassing events denote if-statements, loops, and parallel processes.

sequenceDiagram
    participant ParticipantA
    participant ParticipantB

    ParticipantA->>+ParticipantB: get_data()
    ParticipantB-->>-ParticipantA: data
    create participant ParticipantC
    ParticipantA->>+ParticipantC: set_data(data)
    ParticipantC->>ParticipantC: sanitize_data(data)
    ParticipantC->>ParticipantC: save_data(data)
    ParticipantC-->>-ParticipantA: void
    alt if this
        ParticipantA->>ParticipantB: b_command()
    else else this
        ParticipantA->>ParticipantC: c_command()
    end
    loop For all data points
        ParticipantA-->>ParticipantB: operation(data)
        ParticipantB-->>ParticipantC: operation(data)
        ParticipantC-->>ParticipantA: operation(data)
    end

Perspective and Documentation-Driven Development

The UML metamodels provide the syntax to describe a software system with varying levels of fidelity, and it can be tempting to include as much detail as possible. However, for any relatively complex software, this can be too much information to digest and understand patterns. I suggest to instead focus on the audience and the specific message to communicate by considering the following questions:

Who is the intended audience, and what is their level of experience with your software?
In a few sentences, what specifically are you communicating?
At what level of fidelity does the content of the message exist in the software - conceptual, specification, or implementation?

The three diagrams below are taken from one of my software projects, FLORIS, a wind farm wake modeling framework that provides specific interfaces where developers can plug in new wake models. Consider the three perspectives:

Conceptual describes the relationships between the main components of the software and notes where to connect a new wake model.
Specification describes the connections between a portion of the wake model and the software objects that interface with it.
Implementation describes the specific attributes on a particular class and it's inherited properties.

While each are valuable, maintaining separation allows for focusing a diagram on specific themes for a given audience.

Conceptual

classDiagram

    class Floris
    class Farm

    class FlowField {
        u: NDArrayFloat
        v: NDArrayFloat
        w: NDArrayFloat
    }

    class Grid {
        <<abstract>>
        x: NDArrayFloat
        y: NDArrayFloat
        z: NDArrayFloat
    }
    style Grid stroke:#f66,stroke-width:2px

    class WakeModelManager {
        <<interface>>
        combination_model: BaseModel
        deflection_model: BaseModel
        velocity_model: BaseModel
        turbulence_model: BaseModel
    }
    style WakeModelManager stroke:#f66,stroke-width:2px

    class BaseModel {
        <<abstract>>
        dict parameters
        prepare_function()
        function()
    }

    class Solver {
        <<interface>>
        parameters: dict
    }
    style Solver stroke:#f66,stroke-width:2px

    Floris *-- Farm
    Floris *-- FlowField
    Floris *-- Grid
    Floris *-- WakeModelManager
    Floris --> Solver
    WakeModelManager -- BaseModel

    Solver --> Farm
    Solver --> FlowField
    Solver --> Grid
    Solver --> WakeModelManager

Specification

classDiagram

  class WakeModelManager {
    combination_function
    deflection_function
    turbulence_function
    velocity_function
  }
  <<interface>> WakeModelManager
  class GaussVelocityDeficit {
    prepare_function(...) Dict[str, Any]
    function(...) None
  }
  class GaussVelocityDeflection {
    prepare_function(...) Dict[str, Any]
    function(...) None
  }
  WakeModelManager -- GaussVelocityDeficit
  WakeModelManager -- GaussVelocityDeflection
  BaseModel <|-- GaussVelocityDeficit
  BaseModel <|-- GaussVelocityDeflection

Implementation

classDiagram
  class BaseClass {
    logger
  }
  class Grid {
    cubature_weights: np.ndarray
    grid_resolution : int | Iterable
    x_sorted: np.ndarray
    y_sorted: np.ndarray
    z_sorted: np.ndarray
    set_grid()* None
  }
  <<abstract>> Grid
  class TurbineGrid
  class FlowFieldPlanarGrid
  class FromDictMixin {
    as_dict() dict
    from_dict(data: dict)
  }

  FromDictMixin <|-- Grid
  BaseClass <|-- Grid
  Grid <|-- TurbineGrid
  Grid <|-- FlowFieldPlanarGrid

Similar to test-driven development, documentation-driven development is the practice of stating what you're going to do in the documentation prior to doing it. If there isn't a logical place to put disorganized thoughts, they can be aggregated into a design document that can take the form of a GitHub Discussion or Issue. Suggested content to include in a design document are:

Scope and suggested design of the work
Relationship to existing elements of the software including existing implementations and overarching themes
New themes and design decisions included and excluded

A strict policy of "docs or it didn't happen" can increase the quality and quantity of documentation, but it comes with the added burden for developers, reviewers, and maintainers. A more approachable but less rigorous requirement is to require an extended pull request description that includes a narrative of the changes and an overview of design decisions.

In practice - documentation tooling

There are a variety of web-based tools for creating diagrams including many diagrams in the UML metamodel. In my experience, separating the design documentation from the code is impractical at any stage of a software development effort past the initial conceptual design. I've had the most success including documentation source files and software diagrams alongside the code and managed with version control. For this to work best, text-based formats are preferred over binary formats, and an ecosystem of tools and processes exist to support this workflow.

mermaid

Mermaid.js is a JavaScript library for describing diagrams in text and rendering them in web browsers and other formats. In fact, all diagrams in this article are created with this software. Mermaid contains syntax for the eight UML diagrams listed above, as well as additional diagrams not included in UML. It is well integrated into much of the software development infrastructure including:

Since it's text-based, it is easily managed with version control. Sphinx-based documentation websites render mermaid diagrams directly in documentation source files (sphinxcontrib-mermaid) and in API documentation through docstrings (sphinx.ext.autodoc).

Pyreverse

For Python projects, pyreverse, part of pylint, is a Python library that analyzes class definitions to create class and package diagrams. It can output results in PlantUML (.puml), Mermaid (.mmd), HTML, and various image formats, as well as any format supported by Graphviz. This tools creates diagrams that directly match the code, so it is only able to consider the "implementation" perspective. However, integrating pyreverse with Sphinx-based documentation as part of an automated continuous integration system is an easy way to create the building blocks for manually creating specification and conceptual documentation during design.

Doxygen / graphviz with dot

For C, C++, and Fortran (to some degree) projects, Doxygen is a static code analyzer to create API documentation as well as the following diagrams:

Class hierarchies
Include-dependency graphs
Caller/callee diagrams
Directory graph (similar to package diagram)

It exports all products into a HTML viewer that can be included as part of any web-based documentation. Doxygen itself generates the HTML files and API docs from function and class signatures as well as docstrings. It integrates with Graphviz to create the graphs and embeds images into the HTML. There is a set of extensions to Doxygen as well as indirect support for rendering Mermaid diagrams

In summary

Through the BSSw Fellowship, I've had the opportunity to interact with the community to gather ideas on documentation and communication on software design. In particular, I presented at the NLIT S3C conference in April 2024 ([slides]()) and held an IDEAS HPC Best Practices Webinar in April 2024 ([video]()). I've also put together an online dashboard to collect notes, ideas, and examples of good software diagrams.

Stepping back to consider the big picture, I see visual communicate as one step toward a pattern language for software design. We already have common design patterns and syntactic conventions, but the scientific software community doesn't currently have a common language to talk about our systems at a high level and relate them to each other. I hope to build on this work to continue seeking the pattern language that will unlock shared understanding of the systems we create so that we can use it both to create new, more elegant software systems and bring meaningful recognition to the research software engineers who create them.

Please get in touch at rafael.mudafort@nrel.gov, GitHub, or LinkedIn.

betterscientificsoftware / bssw.io

Possible Blog Article - BSSw Fellowship, Visually communicating elements of software design #2031