Available simulation results?

dombrno commented 6 years ago

Good evening, I am using ALPSCORE/CT-HYB for the calculation of dynamic susceptibilities. This involves calculating G2, inverting the BSE equation, and performing the analytic continuation. The analytic continuation uses Maxent and thus needs some error information as input data. In order to feed a god estimate of the error to Maxent, I am considering a Jackknife resampling procedure on the output of the simulation. Hence, my questions: a) is there anything better/easier than this available somewhere within the ALPS project? b) if not, is the detailed simulation data available in the h5 output file?

Thanks a lot for your help.

shinaoka commented 6 years ago

Hi, I did not implement estimate of error bars simply because I did not see a need at that point. There is no available error information in current h5 output file.

How do you propagate the errors of average sign in a Jackknife resampling?

dombrno commented 6 years ago

The idea would be to mimick a series of shorter QMC runs by splitting the full set of points of the simulation from a single run into bunches. We would then invert the BSE and obtain the dynamic susceptibility in imaginary bosonic frequency, based on each of these bunches. For this, what we would need is G2(l,l', omega_n) at each measurement point of the QMC simulation, if that is dumped anywhere? Said differently, we are trying to avoid having to run a number of independent runs by analyzing the data of the long run we already have. but probably these data are not dumped, I would imagine this would be several Go large.

dombrno commented 6 years ago

In the old times, in the very first version of Alps, there was a function "evaluate" which actually had to be called once the QMC was finished, and did the all the averaging etc. Based on this architecture, it would have been possible to do the procedure mentioned above. But I understand that the new architecture probably does not give access to this fine-grained detail, which is not necessary, since the averaging is done at the end of the QMC run before results are dumped.

shinaoka commented 6 years ago

It may be possible with the accumulator or alea libraries in ALPSCore. Markus Wallerberger may be the right person to answer this technical question. Alex, Emanuel, could you assign him to this thread?

shinaoka commented 6 years ago

Alex, Emanuel, any idea?

egull commented 6 years ago

Yes. We have the FullBinning observable in the old ALPS. That one will keep a number of bins from the simulation (typically 128, I think) and fill them with the values. They are then written into the HDF5 file. A BS equation calculation can then take the data from the bins and post process them. A word of warning though: this will require 128 times the storage for the HDF5 file (and in memory), which is substantial for vertex functions. Hiroshi, to enable this you would have to find the accumulator and change it from mean or nobinning to fullbinning. @dombrno would then have to rerun the calculation. With the new ALPS we have the same capabilities, just the rewrite of the ALEA has the binning procedure optimized a bit. A paper is in preparation.

dombrno commented 6 years ago

I understand, that makes sense: I remember that in the course of adapting the old Alps code (segment) for my needs (expand it to two orbitals, and use non diagonal hybridization function), I replaced the proposed accumulators with others which did not do any binning, and enjoyed a much more compact output file (I did not check memory consumption, and I was only calculating single-particle GF).

egull commented 6 years ago

So... how can we help? Should we get you a version that can do binning so you can try?

dombrno commented 6 years ago

I think that before anything is done, I should assess the memory and storage which are needed at the moment, and consider how much more the hardware is able to handle on my side.

shinaoka commented 6 years ago

Thank you, Emanuel.

@dombrno I want to ask you a real need for your BS equation stuff. What if you just run the CT-HYB solver several times with different random seeds? This may also give some estimates of error bars.

dombrno commented 6 years ago

Yes, that is an option.

Maybe we can keep this feature as a "nice to have" option, but I would give it the lowest priority.

dombrno commented 6 years ago

For what it's worth, with the current code, the h5 file is 100 Mo large, while, and the calculation consumes 5Go RAM. The available RAM is 128Go on my hardware, so I could probably use up to 20 bins, if this option were ever implemented.

If I do have a real strong need for it, I will ask for guidance as to which type of accumulator is best to use in your opinion, and implement it on my side, but for the time being I thank you for your answer and your help, which perfectly answers my initial question.

dombrno commented 6 years ago

@shinaoka I finally went for the option you suggested: run the CT-HYB solver several times with different random seeds. I do this by using the job array feature of the PBS scheduler: a number of jobs are launched with the exact same inputs. Each job uses one full node with 24 cpus. The only difference between each job is the value of SEED in the input.ini file.

Does it look reasonable to you? In particular, as the seed in my setup increases by one at each node, I would like to make sure that the same seed is used by all cpus controlled by a given job, so that there can be no seed overlap between cpus from different jobs. In other terms, I would like a confirmation that all the mpi processes of a single run share the same seed. This is a probably a question for @galexv ? Thanks a lot.

shinaoka commented 6 years ago

Hmm, the value of the seed increases one by one for different nodes? This could cause a problem. For the MPI process of rank n, its pseudorandom number generator is initialized with SEED + n. Here, SEED is the seed in a given file. (This is the specification of ALPSCore libraries) So, the value of SEED should increase at least by 24 from one node to another one.

shinaoka commented 6 years ago

To prevent this happen, I may be able to apply some non-linear transformation $f$ to SEED to initialize Brandon-number generators with f(SEED) + n. What do you think?

dombrno commented 6 years ago

I suspected this could be the case, thanks for clarifying - I will simply increase the seed by 24 on each node, and should be fine then. Thank you!

dombrno commented 6 years ago

Maybe just one detail to make sure everything is working as expected: I am controlling the seed via the key "SEED" at the top level of the .ini file, based on what I saw implemented in alps/mc/mcbase.cp. Is this the recommended way to control this parameter?

shinaoka commented 6 years ago

You're right!

dombrno commented 6 years ago

I have now obtained the data corresponding to 64 runs on a single node, using different seed values. I would like to do some resampling of the quantities G1_LEGENDRE and G2_LEGENDRE. For this purpose I need to use the number of measurements performed on each node for these quantities (the average sign is 1.0). It looks like

/simulation/results/G1_Re/count
/simulation/results/G2_Re/count

might be the suitable fields - can you please confirm if this is correct? Thank you!

egull commented 6 years ago

Yes, that is correct.

On Tue, May 1, 2018, 4:23 AM Dominique Geffroy notifications@github.com wrote:

I have now obtained the data corresponding to 64 runs on a single node, using different seed values. I would like to do some resampling of the quantities G1_LEGENDRE and G2_LEGENDRE. For this purpose I need to use the number of measurements performed on each node for these quantities. It looks like

/simulation/results/G1_Re/count

/simulation/results/G2_Re/count might be te suitable fields - can you please confirm if this is correct? Thank you!

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ALPSCore/CT-HYB/issues/20#issuecomment-385623686, or mute the thread https://github.com/notifications/unsubscribe-auth/AG29Re2uVBs_BXN0vqtiEOboQW0wzz7Aks5tuBtxgaJpZM4S1W9m .

shinaoka commented 6 years ago

BTW, why do you need to know the number of measurements?

dombrno commented 6 years ago

Well, I have 64 samples, and need to calculate the values of G1 and G2 over subsets of these samples, so I was thinking that the number of measurements is the reasonable weight to apply to the contribution of each sample to the partial resummation, given the fact that the average sign is 1.0

ALPSCore / CT-HYB

Available simulation results? #20