improving Stan model efficiency and readability

bob-carpenter commented 4 years ago

Is your feature request related to a problem? Please describe.

The issue is related to making the Stan code more efficient and readable.

Describe the solution you'd like

Ideally, someone codes the suggested changes. With some feedback, I can code the changes myself, but I didn't want to make a massive PR that the original developers didn't like.

Describe alternatives you've considered

Remove variable x because it's not used. (We're developing a lint mode that'll flag issues like unused variables.)

Remove transformed data variable delta because it's not used.

Rename covariate1, ..., covariate6 to x1, ..., x6 because x is conventional for covariates and the term "covariate" doesn't convey much information. Giving these meaningful names would be another possibility.

Clarify comment on N2 as to what days of observed data is. Is it max(N)?

First, rewrite Rt[ , m] for readability:

   Rt[ , m] = mu[m] * exp(covariate1[, m] * (-alpha[1])
                           + covariate2[, m] * (-alpha[2])
                           + covariate3[, m] * (-alpha[3])
                           + covariate4[, m] * (-alpha[4])
                           + covariate5[, m] * (-alpha[5])
                           + covariate6[, m] * (-alpha[6]));

Then rearrange and move to declare/define style followed by an increment for the intercept

  matrix[N2, M] Rt = exp(-(covariate1 * alpha[1]
                           + covariate2 * alpha[2]
                           + covariate3 * alpha[3]
                           + covariate4 * alpha[4]
                           + covariate5 * alpha[5]
                           + covariate6 * alpha[6]));
  for (m in 1:M) Rt[ , m] *= mu[m];

For readability, I'd sacrifice a bit of speed and use

  y_raw ~ exponential(1);  // implies y ~ exponential(1 / tau);

instead of

  target += -sum(y_raw);  // ...

Keep tau itself on unit scale for more efficient adaptation:

  parameters {
    real<lower = 0> tau_unit;

  transformed parameters {
    real<lower = 0> tau = tau_unit / 0.03;

  model {
    tau_unit ~ exponential(1);  // implies tau ~ exponential(0.03)

The convolution code can be simplified from

  for (m in 1:M) {
    prediction[1:N0, m] = rep_vector(y[m], N0);
    for (i in (N0 + 1):N2) {
      real convolution = 0;
      for (j in 1:(i - 1)) {
        convolution += prediction[j, m] * SI[i - j];
      }
      prediction[i, m] = Rt[i, m] * convolution;
    }

to

  for (m in 1:M) {
    for (i in 1:N0) {
      prediction[i, m] = y[m];
    }
    for (i in (N0 + 1):N2) {
      real convolution = y[m] * sum(SI[1:(i - 1)]);
      prediction[i, m] = Rt[i, m] * convolution;
    }

It's a bit counter-intuitive, but the loop is faster because it doesn't require a new vector copy on the right hand side and loops without autodiff are very fast in Stan. Given that preciction[j, m] was just defined to be y[m], that can be used directly to cut down on memory access in the loop.

I assume the centered parameterization

mu ~ normal(2.4, kappa)

works better than the non-centered one, given that other parameters are non-centered. Otherwise, this needs to be rewritten in terms of

mu_std ~ std_normal();

and

mu = 2.4 + kappa * mu_std;

phi and kappa can be put on unit scales,

  transformed parameters {
    real<lower = 0> phi = 5 * phi_std;
    real<lower = 0> kappa = 0.5 * kappa_raw;

  model {
    phi_std ~ std_normal();
    kappa_std ~ std_normal();

The magic number 1e-9 needs doc in the definition of E_deaths[1, m]. E_deaths shows up in the negative binomial likelihood as a rate parmeter. Is it never used because of EpidemicStart? If it's not intended to use, filling unused bits of E_deaths with NaN will make sure they raise an error if they're accidentally used somewhere in the target density.

Put rep_matrix in generated data so it's only done once.

generated data {
  matrix[N2, M] matrix_1000 = rep_matrix(1000, N2, M);
  matrix[N2, M] matrix_0 = rep_matrix(0, N2, M);

generated quantities {
  matrix[N2, M] lp0 = matrix_1000;
  matrix[N2, M] lp1 = matrix_1000;
  ...

The magic number 1000 in the definition of matrix_1000 should be documented.

One way to document constants is to define them as generated data variables with meaningful names.

The E_deaths loop can be pulled up into a function.

  functions {
    matrix expected_deaths(matrix prediction, matrix f) {
      int M = rows(prediction);
      int N = cols(prediction);
      matrix[M, N] y;
      for (m in 1:M) {
        y[1, m] = 1e-9;
    for (i in 2:M) {
      y[2, m] = 0;
      for (j in 1:(i - 1)) {
        y[i, m] += prediction[j, m] * f[i - j, m];
          }
        }
      }
      return y;
    }
  }

And then the blocks of code can be replaced by

  E_deaths = expected_deaths(prediction, f);
  ...
  E_deaths0 = expected_deaths(prediction0, f);

Also, f is an unusual symbol for a matrix.

I'm working on getting ragged arrays into the language---sorry we don't have those yet or the indexing would be much simpler!

flaxter commented 4 years ago

Thank you Bob!!!

dentarthur commented 4 years ago

Suggest no trade off between readability and speed. Include most readable code as comment before fastest code actually run.

Uses will need BOTH.

bob-carpenter commented 4 years ago

First, let me say this is all minor details. So no worries no matter how commenting is done!

@dentarthur says:

Suggest no trade off between readability and speed.

There's always a tradeoff because one needs to at least verify in the code that the comments are true. The code never lies about what it's doing, but it's very easy for comments to get stale. Adding complex code with comments thus adds overhead for any programmer trying to modify the code.

I like the advice from Google's C++ style guide:

Do not state the obvious. In particular, don't literally describe what code does, unless the behavior is nonobvious to a reader who understands C++ well. Instead, provide higher level comments that describe why the code does what it does, or make the code self describing.

@dentarthur also says:

Include most readable code as comment before fastest code actually run.

I've found that's not always so easy in Stan code if the optimizations split things across blocks or loops (this will partly be mitigated when we allow inline variable declarations in the near future). When a lot of comments start piling up, code gets hard to read and it can be easier to pull the algorithm descriptions out of the inner guts of the code and into a higher-level doc. We do that in Stan code with marginalizations of discrete parameters, which can require a fair bit of algebra that's too distracting inline.

harrisonzhu508 commented 4 years ago

@bob-carpenter thank you for the helpful comments! I have linked a draft pull request.

dentarthur commented 4 years ago

@bob-carpenter Thanks for clear explanation/rebuttal

I am sure you are right about what won't work. In any case I certainly don't know.

But I will briefly elaborate on "Uses need both" although there is no need for you to spend time explaining if this is not helpful.

I think there will be three uses of the Stan code and its comments or other annotations (including possible separate code generators/macro processors etc eg for profiling to ensure speedups are actually useful and tests to confirm they do produce same results as readable versions. All of which might be added earlier, later or never.

Running Jupyter notebookks and other apps that use it eg multiple runs with varying paramaters by public health analysts in large numbers of countries and regions (without any interest in modifying Stan code). They will be doing a lot of runs on ordinary PCs and will find the code slow even if the fastest possible version is used.
Maintaining, and enhancing the Stan code for related analysis. Some of this will be done by public health analysts with little familiarity concerning modeling techniques Stan is used for and developers with less. They will (hopefully) be using the readable Stan code together with the Imperial College response team paper and its references and the Stan docs and the Bayesian texts recommended in the Stan docs to understand enough to attempt the changes they wish to make.
Technical documentation writers helping fend off individual queries from large numbers of public health analysts in many regions by writing docs that answer questions already being asked here and do so by actually rather than just "hopefully" studying the readable Stan code in whatever form it is available as well as the rest mentioned under "hopefully" above. Same use as category 2 but doing it in advance for documentation. Need for that is emerging now with questions arriving here at code repo that should end up in being in docs section about docs, not about code.

Whatever method suits the coders the documenters can learn to live with as long as there is some way to be able to find readable versions of the code to understand and explain without raising issues to coders.

bob-carpenter commented 4 years ago

@dentarthur : I forgot that you're getting a lot of readers of this who may not be Stan coders. So the usual doc rules don't make as much sense here.

I would still urge some kind of stand-off annotation rather than filling the code with comments, but as you say, that makes it hard to make sure everyone who gets the code also gets the comments.

Good luck!

Edit: forgot to mention that this is the same problem we run into with our tutorials. How to gently introduce the model concepts using simple language constructs yet not have people use those as the basis for saying "Stan is slow".

dentarthur commented 4 years ago

@bob-carpenter good luck to you and to harrison and rest of the team actually working on it.

I'm just a fly on the wall, but here is a link to related issue offline from here to keep out of the way:

Stan GPU speedup (and ABM CSXMs with SYCL/DPC++) https://github.com/thecapitalistcycle/covid19model/issues/3

s-mishra commented 4 years ago

Hi @bob-carpenter thanks a lot for useful comments. We are working now to incorporate the changes.

s-mishra commented 4 years ago

Solved with the commit 865806ff6f7afb393 and other commit by @MansMeg

ImperialCollegeLondon / covid19model

improving Stan model efficiency and readability #33