epiverse-trace / blueprints

Software development blueprints for epiverse-trace
https://epiverse-trace.github.io/blueprints
Other
4 stars 4 forks source link

Policy for languages other than R #11

Closed pratikunterwegs closed 5 months ago

pratikunterwegs commented 2 years ago

This issue continues a discussion with @Bisaloo on Epiverse preferences for languages that are not R. The discussion began with C++ as the focus as it is expected to be the main non-R language in the Epiverse packages (see e.g. finalsize). This is a non-exhaustive list so please feel free to add to it, to rebut aspects of it, and to recommend solutions.

  1. Code organisation How should we organise the C++ source code that we write? a. One aspect of this question is how much of a function should be in C++ vs being in R? While a good deal of R code can be re-written in C++ using libraries such as Eigen, this is often slower to write and requires specialist maintenance, possibly leading to it being less sustainable. Knowing these costs, what are the criteria for benefits (mostly speed improvements) that should be met before translating R code into C++ (or another language such as Julia)? b. Another aspect of this is where C++ should live. While C++ files are usually in src/, placing them in inst/include has been suggested by @BlackEdder in order to make the package usable as a header-only library by other Rcpp packages (if I understand correctly) - this would be similar to Boost Headers. This would likely mean having an internal core function in a header, which is called by a wrapper function that is exposed to R and exported from the package. This is a bit more work, especially when it comes to thinking about the dependencies of future packages.

  2. Code formatting Which code formatting guide should we follow for C++ (and other languages)? a. The Google C++ style guide seems to be a good shout, and is implemented by both cpplint and clang-format. b. @Bisaloo has suggested MegaLinter as a cross-language formatting solution.

  3. Other miscellaneous issues a. Should we follow other conventions, such as having a copyright statement included in C++ files? Would it be sufficient to assume that the top-level MIT Licence covers this already? b. Should we prefer certain versions of C++ (e.g. finalsize uses C++11), based on stability or other criteria?

chartgerink commented 5 months ago

Closing this issue due to inactivity.

pratikunterwegs commented 5 months ago

I would rather say that the issues raised here have been largely addressed, though the issue wasn't updated and closed to reflect that. So for completeness' sake, in the {epidemics} context (and to a lesser extent, {finalsize}):

  1. Using C++ vs R:

    • I've taken a minimum necessary C++ approach with very standard implementations, where performance gains need to be roughly > 10x for a codebase to be translated to C++, and the implementation should ideally be understandable after looking at e.g. Boost's examples.
    • The blog post on sharing C++ code from an R package addresses why most of our C++ code lives in header files and how it's organised.
  2. C++ code formatting follows the Google style, and tools to check this are detailed in the blog post on C++ code formatting.

  3. C++ code formatting enforces having a copyright statement in files, and these refer to the overall package license. We don't prefer a particular C++ version and aim to support the version used by CRAN, which is C++17 I think. We offer pre-compiled binaries to hopefully sidestep compilation issues where users are somehow on older systems.