eggplantbren / DNest4

Diffusive Nested Sampling
MIT License
60 stars 21 forks source link

High number of calls of copy constructor/assigment operator for ModelType #39

Open qacwnfq opened 2 years ago

qacwnfq commented 2 years ago

Hey,

I was just benchmarking DNest4 for some of my more memory intensive/complex models and I noticed, that the copy constructor and assigment operators were called quite often. This leads to some low performance in my use cases, which is why I tried to identify the code, where it happens.

I think this is the place in SamplerImpl.h:

    LikelihoodType& logl = log_likelihoods[which];

    // Do the proposal for the particle
    ModelType proposal = particle;
    double log_H = proposal.perturb(rng);

    // Prevent unnecessary exponentiation of a large number
    if(log_H > 0.0)
        log_H = 0.0;

    if(rng.rand() <= exp(log_H))
    {
        LikelihoodType logl_proposal(proposal.log_likelihood(), logl.get_tiebreaker());

        // Do the proposal for the tiebreaker
        log_H += logl_proposal.perturb(rng);

        // Accept?
        if(level.get_log_likelihood() < logl_proposal)
        {
            particle = proposal;
            logl = logl_proposal;
            level.increment_accepts(1);
        }

There are two lines here which impact performance: The first is ModelType propsal=particle; and the second is particle = proposal;

I believe these copies can be replaced by something more efficient for complex models, because we actually only have to track the proposal and log(H) and then decide if the ModelType should set state to the value of proposal or if it should keep the state it already had. In this case we wouldn't need to copy the whole model every time we try to update a particle, we would only change the internal state, which can probably even be done by std::move or std::swap since we discard the rejected proposals in this snippet.

I can make a PR for this, but I'm unsure how to deal with the changes in the interface for ModelType.

One solution which would not break existing code (but uses some more complex template code) is to check at compile time if the ModelType has the required members to circumvent copying and otherwise to sample like it is sampling now. This is something we've used in our convex polytope sampling library, see for example https://github.com/modsim/hops/blob/main/include/hops/MarkovChain/MarkovChainAdapter.hpp.

Any thoughts on this?

eggplantbren commented 2 years ago

I don't think you can do the Metropolis algorithm without making a copy, because you need to backup the original state in case the proposal is rejected. I think you're right that the assignment that happens upon acceptance could be avoided by some sort of swap. However I failed to get an implementation that avoids the assignment operator here (I thought I understood std::move and so on, but I guess I don't). Feel free to make a PR if you can get that part working.

On Mon, Nov 15, 2021 at 3:33 AM Johann Fredrik Jadebeck < @.***> wrote:

Hey,

I was just benchmarking DNest4 for some of my more memory intensive/compex models and I noticed, that the copy constructor and assigment operators were called quite often. This leads to some low performance in my use cases, which is why I tried to identify the code, where it happens.

I think this is the place in SamplerImpl.h:

LikelihoodType& logl = log_likelihoods[which];

// Do the proposal for the particle ModelType proposal = particle; double log_H = proposal.perturb(rng);

// Prevent unnecessary exponentiation of a large number if(log_H > 0.0) log_H = 0.0;

if(rng.rand() <= exp(log_H))
{
  LikelihoodType logl_proposal(proposal.log_likelihood(),
                                          logl.get_tiebreaker());

  // Do the proposal for the tiebreaker
  log_H += logl_proposal.perturb(rng);

  // Accept?
  if(level.get_log_likelihood() < logl_proposal)
  {
      particle = proposal;
      logl = logl_proposal;
      level.increment_accepts(1);
  }

There are two lines here which impact performance: The first is ModelType propsal=particle; and the second is particle = proposal;

I believe these copies can be replaced by something more efficient for complex models, because we actually only have to track the proposal and log(H) and then decide if the ModelType should set state to the value of proposal or if it should keep the state it already had. In this case we wouldn't need to copy the whole model every time we try to update a particle, we would only change the internal state, which can probably even be done by std::move or std::swap since we discard the rejected proposals in this snippet.

I can make a PR for this, but I'm unsure how to deal with the changes in the interface for ModelType.

One solution which would not break existing code (but uses some more complex template code) is to check at compile time if the ModelType has the required members to circumvent copying and otherwise to sample like it is sampling now. This is something we've used in our convex polytope sampling library, see for example https://github.com/modsim/hops/blob/main/include/hops/MarkovChain/MarkovChainAdapter.hpp .

Any thoughts on this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eggplantbren/DNest4/issues/39, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMBKOXRNMU55R7J2ZUG73DUL7CFJANCNFSM5H75NJJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Dr Brendon J. Brewer Phone: +64 27 5001336 https://www.brendonbrewer.com @.***:3 https://keybase.io/brendonbrewer