OneShiftOnly should abort when cost function is provided

jtkrogel commented 5 years ago

The OneShiftOnly optimizer minimizes only the total energy and is not capable of variance minimization, or minimization of mixtures of energy and variance.

Additionally, the default cost function (0.9*energy+0.1*unreweightedvariance) is used by OneShiftOnly for reporting purposes to alert the user if the variance component is getting out of hand (reported cost function in log output is different than just pure total energy, but this a feature intended by @ye-luo and so not a bug in itself).

Due to the above, OneShiftOnly should abort if any cost function is provided by the user, with a message explaining that it 1) cannot accept cost function input and 2) only performs energy minimization.

prckent commented 5 years ago

Good point.

Is there not a more general issue of optimizers appearing to accept parameters that are actually unused?

jtkrogel commented 5 years ago

Yes, there is. It's just that the cost function is presented as a general feature in the manual (see XML blocks there combining cost and OneShiftOnly), and so I fully expected cost to be compatible with OneShiftOnly.

The optimizer silently ignoring your request to optimize the variance is like DMC ignoring all walker input, i.e. it is a big enough issue to require its own fix.

prckent commented 5 years ago

Another way to implement this in OneShiftOnly would be to abort if the cost function is not pure energy minimization. The cost function used for monitoring should not recycle the default one used for optimization. This is an important restriction since it implies that if OneShiftOnly is similarly robust to other pure energy minimizers, then it should only be used from a reasonably good starting point.

ye-luo commented 5 years ago

It is funny that when proposing a new set of parameters, all the optimizers do energy minimization with linear method. After that, cost function is used to gate the final target. For me, this logic is broken but it can work to some extend.

zenandrea commented 5 years ago

Hi @jtkrogel and @prckent , I also find that the manual is misleading in this point. Although the manual tells this:

This method [OneShiftOnly] relies on the effective weight of correlated sampling rather than the cost function value to justify a new set of parameters.

I found the point very cryptical, especially in view of the fact that in the input example at page 124 of manual v3.6.0 there is the definition of the cost function and of MinMethod=OneShiftOnly.

I would find useful if the manual would provide some proper reference at the beginning of both the OneShiftOnly and the adaptive MinMethods. The description of the methods in the manual is very shallow and it would be useful to know where to find more details.

Anyway, I am doing some tests comparing old results I obtained with CASINO with results I get from QMCPACK. In CASINO I was used to optimize the Jastrow by minimizing the variance. I tried to do the same with QMCPACK, and in order to have a fair comparison I generated the determinant part in the same way (DFT/LDA from QE). I tested more than one system, and CASINO yields systematically a variance that is smaller by around 10 to 20%. I tried with the following input:

<loop max="3">
  <qmc method="linear" move="pbyp" checkpoint="-1" gpu="yes">
    <estimator name="LocalEnergy" hdf5="no"/>
    <parameter name="blocks">       10 </parameter>
    <parameter name="substeps">     10 </parameter>
    <parameter name="warmupSteps"> 100 </parameter>
    <parameter name="usedrift">     no </parameter>
    <parameter name="timestep">    0.4 </parameter>
    <parameter name="samples"> 1600000 </parameter>
    <parameter name="stepsbetweensamples"> 1 </parameter>
    <!-- correlated sampling options -->
    <parameter name="minwalkers">  0.50 </parameter>
    <parameter name="maxWeight">  1.6e6 </parameter>
    <parameter name="nonlocalpp">    no </parameter>
    <!-- cost funtion -->
    <cost name="energy">               0.0  </cost>
    <cost name="reweightedvariance">   0.0  </cost>
    <cost name="unreweightedvariance"> 1.0  </cost>
    <!-- optimization options -->
    <parameter name="MinMethod">adaptive</parameter>
  </qmc>
</loop>

Is there something wrong in this input?

By the way, it is not very clear to me if, when and where I should write gpu="yes"? I initially imagined I was doing all my calculations with gpu, so I wrote it everywhere it is possible. I am not sure if I should remove them if I am not using the gpu.

ye-luo commented 5 years ago

For the CUDA build, it is still recommended ot leave gpu="yes". However I noticed that if you put gpu="no" the code will crash in some cases. We are working on a new GPU implementation in which both yes and no will be maintained and tested.

ye-luo commented 5 years ago

@zenandrea If you do not put Jastrow, does the VMC variance agree? page 124 is misleading. It was intended to break up the long optimization input section and explain you briefly what each part does. If you go down to OneShihftOnly, you will see clear example blocks. quartic was the oldest one. adaptive is for Berkeley. OneShiftOnly is from me. You clearly see different flavours here.

zenandrea commented 5 years ago

@ye-luo Thanks for the quick response. I am sorry but there is a point I do not get when you say

For the CUDA build, it is still recommended ot leave gpu="yes".

Do you mean that it is recommended to leave or do not leave gpu="yes" in the qmc attribute?

Then, let me check if I got the correct behavior reading the manual: If I am running a CPU-only qmcpack version (ie, not compiled with CUDA) the flag is just ignored (because, clearly, it would be gpu=no). If I am running a qmcpack version with CUDA, then gpu="yes" is telling to use the gpu, whereas if gpu="no" the gpu is not used.

Is that correct?

If yes, is the default value "yes" for the CUDA case and "no" for the CPU-only case?

In which cases should I run the CUDA version with gpu="no"? Why not running instead the AoS or the SoA versions straight away?

Moreover, I see that there is the flag gpu="yes" also in the determinantset attribute. I do not get why there is the same flag in two places in the input. If I define gpu="yes"/"no" is some place, I expect it has to be such everywhere. Am I wrong? Or is there any case where it is useful to have, for instance, yes in determinantset and no in qmc?

prckent commented 5 years ago

Comment: the CUDA code largely ignores the gpu="yes" or "no" tag, and always uses the GPU. In future this will be respected, so is worth including. While we want to do this ASAP, realistically the new GPU version is a long way off.

Short version: don't worry about the gpu flag now.

ye-luo commented 5 years ago

Do you mean that it is recommended to leave or do not leave gpu="yes" in the qmc attribute?

Leaving gpu="yes" or not in the input file doesn't matter really, just don't put "no". I did see runs crash if I put gpu="no" somewhere and ran the CUDA code.

Then, let me check if I got the correct behavior reading the manual: If I am running a CPU-only qmcpack version (ie, not compiled with CUDA) the flag is just ignored (because, clearly, it would be gpu=no). If I am running a qmcpack version with CUDA, then gpu="yes" is telling to use the gpu, whereas if gpu="no" the gpu is not used. Is that correct?

Correct. This was intended originally to allow users select using GPU or not via input. Unfortunately, this is no more well maintained and the CUDA build can only guarantee working gpu="yes" in every place.

If yes, is the default value "yes" for the CUDA case and "no" for the CPU-only case? Correct, this is the default as I remember. In which cases should I run the CUDA version with gpu="no"? Why not running instead the AoS or the SoA versions straight away?

If you use CUDA verion, you should not use gpu="no". If you only need CPU, just run SoA or AoS. However, in the new GPU version currently under development, we would like to provide dispatching selection from user input file. This is very important. If A, B, C parts can be offloaded but efficiency depends on the problem size and some of them may not able to run due to memory. So all the yes/no combination is needed.

Moreover, I see that there is the flag gpu="yes" also in the determinantset attribute. I do not get why there is the same flag in two places in the input. If I define gpu="yes"/"no" is some place, I expect it has to be such everywhere. Am I wrong? Or is there any case where it is useful to have, for instance, yes in determinantset and no in qmc?

As I explained above, you may have GPU acceleration for many pieces of code and controllable from input, you will see the same amount of gpu="yes/no" in the input file. As Paul said, no worry, we will put "yes" at every place by default for users.

zenandrea commented 5 years ago

@ye-luo many thanks.

You also asked the difference between QMCPACK and CASINO if I have no Jastrow. I just made the VMC calculations.

With CASINO I get (everything in au): Energy: -53.4242(3) Var: 9.73(2)
And with QMCPACK: vmc_cpu series 1 -53.424455 +/- 0.000666 1.0 9.803351 +/- 0.081267 1.0 0.1835

It seems that the agreement is good. But, with a 3-body Jastrow in CASINO I have a variance of 0.790(1), while with QMCPACK I only managed to reach a 0.916275 +/- 0.003829.

zenandrea commented 5 years ago

@ye-luo Well, I suspect the optimizer is not minimizing the variance because, in output, I see this kind of stuff

        shift_i        shift_s       max param change    cost function value
   ------------   ------------   --------------------   --------------------
            N/A            N/A                    N/A       -54.722808701711        initial
     2.5000e-02     2.5000e+00             8.3207e-02       -54.717743562276
     1.0000e-01     1.0000e+01             1.2003e-02       -54.723515267589  <--
     4.0000e-01     4.0000e+01             2.0650e-03       -54.722888729898

where the cost function seems to be the energy, and not something related with the variance, that I would like to minimize. This, despite my input was:

    <cost name="energy">               0.0  </cost>
    <cost name="reweightedvariance">   0.0  </cost>
    <cost name="unreweightedvariance"> 1.0  </cost>

ye-luo commented 5 years ago

It is true. The adaptive optimizer is also targeting energy just like the OneShiftOnly.

ye-luo commented 5 years ago

If you only include 1 and 2 body, does the energy and variance agree between CASINO and QMCPACK?

jtkrogel commented 5 years ago

@zenandrea You could consider using the quartic optimizer. It does respect the cost function (you can perform variance minimization with it) and it is still the primary optimizer that I use in production.

You could try optimization blocks like the following:

   <loop max="8">
      <qmc method="linear" move="pbyp" checkpoint="-1">
         <cost name="energy"              >    0.0                </cost>
         <cost name="unreweightedvariance">    1.0                </cost>
         <cost name="reweightedvariance"  >    0.0                </cost>
         <parameter name="warmupSteps"         >    300                </parameter>
         <parameter name="blocks"              >    100                </parameter>
         <parameter name="steps"               >    1                  </parameter>
         <parameter name="subSteps"            >    10                 </parameter>
         <parameter name="timestep"            >    0.3                </parameter>
         <parameter name="useDrift"            >    no                 </parameter>
         <parameter name="samples"             >    51200             </parameter>
         <parameter name="MinMethod"           >    quartic            </parameter>
         <parameter name="minwalkers"          >    0.3                </parameter>
         <parameter name="nonlocalpp"          >    yes                </parameter>
         <parameter name="useBuffer"           >    yes                </parameter>
         <parameter name="alloweddifference"   >    0.0002             </parameter>
         <parameter name="exp0"                >    -6                 </parameter>
         <parameter name="bigchange"           >    10.0               </parameter>
         <parameter name="stepsize"            >    0.15               </parameter>
         <parameter name="nstabilizers"        >    1                  </parameter>
         <estimator name="LocalEnergy" hdf5="no"/>
      </qmc>
   </loop>
   <loop max="4">
      <qmc method="linear" move="pbyp" checkpoint="-1">
         <cost name="energy"              >    0.0                </cost>
         <cost name="unreweightedvariance">    1.0                </cost>
         <cost name="reweightedvariance"  >    0.0                </cost>
         <parameter name="warmupSteps"         >    300                </parameter>
         <parameter name="blocks"              >    100                </parameter>
         <parameter name="steps"               >    1                  </parameter>
         <parameter name="subSteps"            >    10                 </parameter>
         <parameter name="timestep"            >    0.3                </parameter>
         <parameter name="useDrift"            >    no                 </parameter>
         <parameter name="samples"             >    204800             </parameter>
         <parameter name="MinMethod"           >    quartic            </parameter>
         <parameter name="minwalkers"          >    0.3                </parameter>
         <parameter name="nonlocalpp"          >    yes                </parameter>
         <parameter name="useBuffer"           >    yes                </parameter>
         <parameter name="alloweddifference"   >    0.0002             </parameter>
         <parameter name="exp0"                >    -6                 </parameter>
         <parameter name="bigchange"           >    10.0               </parameter>
         <parameter name="stepsize"            >    0.15               </parameter>
         <parameter name="nstabilizers"        >    1                  </parameter>
         <estimator name="LocalEnergy" hdf5="no"/>
      </qmc>
   </loop>

zenandrea commented 5 years ago

If you only include 1 and 2 body, does the energy and variance agree between CASINO and QMCPACK?

Well, no. But I guess it might be because I never performed a proper variance minimization in QMCPACK. Now I am trying to optimize using quartic and the input suggested by @jtkrogel (many thanks for that).

With J2 and same cutoffs and number of parameters I get:

CASINO: VMC #3: E = -54.659(1) ; var = 1.034(6) (correlation.out.2)

QMCPACK: opt_cpu series 9 -54.682430 +/- 0.002176 1.0 1.132345 +/- 0.013728 1.7 0.0207

So, QMCPACK is lower in energy, CASINO in variance.

ye-luo commented 5 years ago

I got the following: OneShiftOnly: -54.738030 +/- 0.001163 1.043964 +/- 0.008735 0.0191 quartic: -54.735507 +/- 0.001377 1.034825 +/- 0.012227 0.0189 But I'm not sure if my orbitals are exactly the same as yours.

ye-luo commented 5 years ago

I was running on 128 nodes of Titan with the following optimization blocks. I first optimize J1 and J2. Then J3 is included with empty coefficients (namely zero).

 <loop max="2">
  <qmc method="linear" move="pbyp" checkpoint="-1"  gpu="yes">
    <parameter name="blocks">       50 </parameter>
    <parameter name="substeps">      5 </parameter>
    <parameter name="warmupSteps">   2 </parameter>
    <parameter name="usedrift">     no </parameter>
    <parameter name="timestep">    0.5 </parameter>
    <parameter name="samples">  102400 </parameter>
    <parameter name="MinMethod"> OneShiftOnly </parameter>
    <parameter name="minwalkers"> 1e-3 </parameter>
  </qmc>
 </loop>
 <loop max="10">
  <qmc method="linear" move="pbyp" checkpoint="-1"  gpu="yes">
    <parameter name="blocks">      100 </parameter>
    <parameter name="substeps">      5 </parameter>
    <parameter name="warmupSteps">   2 </parameter>
    <parameter name="usedrift">     no </parameter>
    <parameter name="timestep">    0.5 </parameter>
    <parameter name="samples">  819200 </parameter>
    <parameter name="MinMethod"> OneShiftOnly </parameter>
    <parameter name="minwalkers">  0.5 </parameter>
  </qmc>
 </loop>

zenandrea commented 5 years ago

I got the following: OneShiftOnly: -54.738030 +/- 0.001163 1.043964 +/- 0.008735 0.0191 quartic: -54.735507 +/- 0.001377 1.034825 +/- 0.012227 0.0189 But I'm not sure if my orbitals are exactly the same as yours.

@ye-luo , are there the results for J123? In the J123 case I get similar results in terms of variance, but my energy is slightly worse: OneShiftOnly:
-54.720183 +/- 0.000793 0.973202 +/- 0.004048 quartic (cost=unreweightedvariance):
-54.717147 +/- 0.002450 0.985512 +/- 0.008854

Well, if our orbitals are different, a little difference in the energy can be expected. Also, my results are obtained before you fixed the parseCasino #1511

I do not know if our Jastrow is exactly the same. This is my Jastrow:

      <jastrow name="J2" type="Two-Body" function="Bspline" print="yes">
        <correlation speciesA="u" speciesB="u" size="8" rcut="12.0">
          <coefficients id="uu" type="Array"> 0.446812054 0.1798961152 0.06987282335 0.00730875661 -0.01561602035 -0.01379240
798 -0.01054364133 -0.007225509462</coefficients>
        </correlation>
        <correlation speciesA="u" speciesB="d" size="8" rcut="12.0">
          <coefficients id="ud" type="Array"> 0.6095755685 0.2286464462 0.0922514148 0.02377005153 -0.002082175675 -0.0030436
60912 -0.001216441506 -0.0008373444844</coefficients>
        </correlation>
      </jastrow>
      <jastrow name="J1" type="One-Body" function="Bspline" source="ion0" print="yes">
        <correlation elementType="C" size="8" rcut="6.6">
          <coefficients id="eC" type="Array"> -1.040213918 -0.9604142182 -0.7145323679 -0.4262640585 -0.2653046372 -0.1456223
469 -0.06505155222 -0.02554663739</coefficients>
        </correlation>
        <correlation elementType="O" size="8" rcut="5.3">
          <coefficients id="eO" type="Array"> -2.176257507 -2.104341264 -1.679044484 -1.149314223 -0.6253925488 -0.4565181652
 -0.2231139053 -0.1329028082</coefficients>
        </correlation>
        <correlation elementType="H" size="8" rcut="3.5">
          <coefficients id="eH" type="Array"> -0.1159500206 -0.1143760984 -0.1009016479 -0.07359560111 -0.05263659853 -0.0363
1716041 -0.02154618605 -0.0101356784</coefficients>
        </correlation>
      </jastrow>
      </jastrow>
<jastrow name="J3" type="eeI" function="polynomial" source="ion0" print="yes">
<correlation ispecies="C" especies="e" isize="3" esize="3" rcut="5">
<coefficients id="eeC" type="Array" optimize="yes"> -0.007022328834 0.001058736338 0.01781090558 0.00553244229 -0.00164095473
6 -9.222941189e-05 0.01528671758 -0.003464549325 0.01446794683 -0.003478167384 0.006395362126 -0.01697879251 -0.004579314083 
-0.01142674985 -0.0008072218707 0.0009739518557 0.01586393729 -0.02749675761 0.02333114418 0.0004574716267 0.002452651686 -0.
002198110101 -0.0002232135387 -0.001378196947 0.000146489857 6.587497233e-05</coefficients> </correlation>
<correlation ispecies="O" especies="e" isize="3" esize="3" rcut="5">
<coefficients id="eeO" type="Array" optimize="yes"> -0.006033932426 -0.0007020631238 0.01958675124 0.006532820898 0.002110988
334 0.0003801370797 0.00838728386 -0.0002640521204 0.01087249916 0.0001799568548 0.01288152393 -0.03257463087 0.004074682232 
-0.0178350057 -0.00883560995 0.00117117344 0.01633398365 -0.058852452 0.06971247224 -0.01729442807 0.006131989481 -0.01001688
522 0.003765176656 -0.003894130698 0.003617475499 -0.001364822315</coefficients> </correlation>
<correlation ispecies="H" especies="e" isize="3" esize="3" rcut="3">
<coefficients id="eeH" type="Array" optimize="yes"> -0.3898771801 -0.003144037199 2.408513807 0.3538803075 -0.369692267 0.024
91174732 0.8609555789 0.2930589195 3.802359424 -0.2890448824 0.9477708995 -6.89888472 -0.6853360827 -1.569073697 -1.023897711
 0.5772076728 3.193794291 -9.93985036 21.43058046 -5.933976636 -0.1277939599 -5.738078246 2.559902688 0.1651031707 2.64905588
3 -1.553668225</coefficients> </correlation>
</jastrow>

As you can see, I allow only J2 to make a difference between up and down electrons. The cutoffs should be good (in casino they can be optimized, so I took the optimized values). The number of parameters are the same as I had in casino (although functional forms are different; casino uses the J1 and J2 parameters for a sort of polynomial expansion rather than a spline). I will try to increase the number of spline parameters in J1 and J2 and see if variance decrease.

mcbennet commented 4 years ago

I would like to revive this to mention that I agree with the original suggestion that having qmcpack abort when the user specifies a cost function that is not pure energy minimization when using oneshiftonly is needed. I was caught by this myself recently. Additionally, as mentioned, the input example in the manual within "Wavefunction optimization" shows both a variance cost and oneshiftonly which is misleading.

QMCPACK / qmcpack

OneShiftOnly should abort when cost function is provided #1494