QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
297 stars 139 forks source link

Need smart pointers. Using LCAOrbitalSetWithCorrection as a starting example #1624

Open ye-luo opened 5 years ago

ye-luo commented 5 years ago

1623 exposed a problem that we need to reduce the use of raw pointers and use smart pointers if pointers are needed to share data between class objects.

I extract lines of code to better describe a right way to do in my mind.

struct LCAOrbitalSetWithCorrection : public LCAOrbitalSet
{
  SoaCuspCorrection cusp;

  SPOSet* LCAOrbitalSetWithCorrection::makeClone() const
  {
    LCAOrbitalSetWithCorrection* myclone = new LCAOrbitalSetWithCorrection(*this);
    myclone->myBasisSet                  = myBasisSet->makeClone();
    myclone->IsCloned                    = true;
    return myclone;
  }
}

struct SoaCuspCorrection
{
  typedef CuspCorrectionAtomicBasis<RealType> COT;
  aligned_vector<COT*> LOBasisSet; // raw pointer, bad

  SoaCuspCorrection(const SoaCuspCorrection& a) = default;
}

template<typename T>
class CuspCorrectionAtomicBasis
{
  typedef MultiQuinticSpline1D<T> RadialSetType;
  RadialSetType AOs;
  std::vector<T> phi, dphi, d2phi;

  CuspCorrectionAtomicBasis(const CuspCorrectionAtomicBasis& a) = default;

  inline void evaluate(const T r, T* restrict vals)
  {
     phi.resize(nr);
  }
}

template<typename T>
class MultiQuinticSpline1D
{
  coeff_type* Coeffs; // raw pointer, bad
  bool own_spline;

  MultiQuinticSpline1D(const MultiQuinticSpline1D& in) = default;

  MultiQuinticSpline1D<T>* makeClone() const
  {
    MultiQuinticSpline1D<T>* myclone = new MultiQuinticSpline1D<T>(*this);
    myclone->own_spline              = false;
    return myclone;
  }
}

Each thread has a copy of LCAOrbitalSetWithCorrection. The makeClone function triggers the copy constructor and copies the pointers of SoaCuspCorrection::LOBasisSet by value instead of deep copy. Multple threads attempt to resize in CuspCorrectionAtomicBasis::evaluate and breaks the code.

The code was correct before #1611 by defining phi[nr] local to the function. This is dangerous due to VLA and may have low performance due to allocation and deallocation on the fly per thread. For the moment, the VLA is replaced with std::vector phi(nr) defined local to the function but the allocation and deallocation issue remains. So this is a workaround.

The root cause of this problem is that SoaCuspCorrection uses the default copy constructor but it was not made safe. See MultiQuinticSpline1D an example, when there is a raw pointer in the structure, a specialized copy function (MultiQuinticSpline1D::makeClone) is needed to manage the ownership of the pointer. This is actually a misuse of makeClone. makeClone is only needed when there are base and derived classes. Instead, we should use C++ smart pointers and use shared_ptr for Coeffs. In this way, the compiler will handle the ownership ofCoeffs and the class can be made destruction safe. After using shared pointer, the makeClone is no more needed in MultiQuinticSpline1D and the default copy constructor of MultiQuinticSpline1D and CuspCorrectionAtomicBasis are safe to use.

In SoaCuspCorrection, LOBasisSet should be replaced as aligned_vector<COT> without pointers. In this way, SoaCuspCorrection has a safe default copy constructor.

There is a similar situation in myBasisSet that we can make better use of smart pointer and thus IsCloned state doesn't need to be explicitly managed and we can safely destructor SoaCuspCorrection class. The desired code should have LCAOrbitalSetWithCorrection default copy constructor safe and makeClone function should only contains a line invoking the copy constructor.

jtkrogel commented 5 years ago

I don't really understand part of this. From my point of view, makeClone was created to handle per thread copies (partial deep copy) of objects in all cases, not just for base and derived classes.

I don't see how proper makeClone (deep copy as appropriate) does not provide a fix for your issue. From this point of view, I don't see how smart pointers are strictly needed (of course it is easier to argue that it would have been nicer to have started with them in the design, but this may or may not be enough to warrant the effort of code rewrite).

ye-luo commented 5 years ago

@jtkrogel I'm not saying removing all the makeClone.

makeClone requires lines of code while copy constructor can be handled by compilers. So if it is possible to reply on copy constructors, we should avoid makeClone. Because copy constructor cannot be virtual, to make a copy of a derived class object when only the base class pointer is known, we must use makeClone function. This use pattern exists a lot in QMCPACK and there is no need of change. However, if the derived class is known or there is no inheritance, we should directly use copy constructors.

Here LCAOrbitalSetWithCorrection::makeClone cannot be removed. Becase it is known as SPOSet from outside, invoking the copy constructor of SPOSet only copies SPOSet which is wrong. The virutal makeClone does the correct copy of the full derived class. Instead MultiQuinticSpline1D is not a derived class, we should not use makeClone but rely on a good copy constructor and thus we don't need to implement makeClone at all levels of the call stack because compilers generate copy constructor of all the levels for you.

Copy constructors can handle deep copy of containers but it cannot deep copy pointers. In the above example, we need MultiQuinticSpline1D::Coeffs as a pointer because the data is not small and it is worth sharing it by multiple threads. However, we were manually managing the ownership of this pointer and shared_ptr can do this for us. On the other hand, aligned_vector<COT*> LOBasisSet is a bad use of pointers. The intermediate result vector has to be local to the function instead of COT because COT pointers are not deep copied. @jtkrogel you can implement makeClone to fix this, but I think it is better to just avoid pointers and then the compiler generated default copy constructor works out of the box. COTs are all private to the thread and at the very bottom the Coeffs is a shared_ptr for all the thread copies.

ye-luo commented 5 years ago

@jtkrogel A partial deep copy can also be achieved with a customized copy constructor and this is not the reason for using makeClone.

prckent commented 5 years ago

I think part of the problem is that it is not clear who/what is responsible for an object over its lifetime. Reference counting is one way of avoiding leaks, e.g. via smart_ptr, but we should still have the overall life of objects made clearer.

ye-luo commented 5 years ago

I think when we use pointers, we need to ask

  1. Is this pointer necessary? (justification)
  2. Who own the pointer? (ownership)
  3. When and where the pointer is created? (lifetime)
  4. When and where the pointer is destroyed? (lifetime)
PDoakORNL commented 5 years ago

Why do I need this pointer? Is there more fundamental problem with the design?

markdewing commented 5 years ago

In MultiQuinticSpline1D, Coeffs and first_deriv seem very similar conceptually - they are both spline parameters. They are treated differently in the class, though, because of the size differences (Coeffs is large, first_deriv is small). It seems that nothing in MultiQuinticSpline1D needs to be per-thread. A single copy of MultiQuinticSpline1D would work across threads, and Coeffs would not need to be a pointer. This would also treat Coeffs and first_deriv more equally. The copy constructor might need to be disabled to avoid inadvertent copying.

The next class up, CuspCorrectionAtomicBasis, stores a single MultiQuinticSpline1D. Currently the temporary storage is allocated at each evaluate call, which is likely suboptimal. The temporary storage could be allocated here at the class level, and copies of the class made per-thread. Alternately, the temporary storage could be pushed up a level, to SoaCuspCorrection and passed as a parameter to the evaluate calls. Then there only needs to be a single copy of CuspCorrectionAtomicBasis as well.