TRIQS / triqs_0.x

DEPRECATED -- This is the repository of the older versions of TRIQS
Other
11 stars 9 forks source link

Pade test fail on Ubuntu #58

Closed parcollet closed 12 years ago

parcollet commented 12 years ago

The Pade test fails. on Ubuntu (and mac).

This is probably due to a lack of precision of the Pade.

https://gist.github.com/2397181

aeantipov commented 12 years ago

My results:

      Start 79: ExampleTest
79/87 Test #79: ExampleTest ......................   Passed    0.21 sec
      Start 80: ExampleTestH5
80/87 Test #80: ExampleTestH5 ....................   Passed    1.53 sec
      Start 81: HDF5_IO
81/87 Test #81: HDF5_IO ..........................***Failed    1.05 sec
      Start 82: GF_Init
82/87 Test #82: GF_Init ..........................***Failed    1.78 sec
      Start 83: GF_BasOps
83/87 Test #83: GF_BasOps ........................***Failed    1.90 sec
      Start 84: dos1
84/87 Test #84: dos1 .............................   Passed    7.97 sec
      Start 85: Pade
85/87 Test #85: Pade .............................***Failed    1.01 sec
      Start 86: SingleSiteBethe
86/87 Test #86: SingleSiteBethe ..................   Passed   24.36 sec
      Start 87: CDMFT_4_sites-v2
87/87 Test #87: CDMFT_4_sites-v2 .................***Failed   26.83 sec

on

$ uname -a
Darwin xserve02.cpfs.mpg.de 11.3.0 Darwin Kernel Version 11.3.0:
Thu Jan 12 18:47:41 PST 2012; root:xnu-1699.24.23~1/RELEASE_X86_64 x86_64
parcollet commented 12 years ago

I guess the other failures are the same as in #35, due to your high version of hdf5 ? Normally, if you look at the h5diff (see Testing/LasTest.log for the command and path), you should find no diff, except for Pade...

aeantipov commented 12 years ago

Also tested it on a gcc build(previous results - for clang). Surprisingly - the Pade test is fulfilled.


81/87 Test #81: HDF5_IO ..........................***Failed    0.62 sec
      Start 82: GF_Init
82/87 Test #82: GF_Init ..........................***Failed    1.06 sec
      Start 83: GF_BasOps
83/87 Test #83: GF_BasOps ........................***Failed    1.24 sec
      Start 84: dos1
84/87 Test #84: dos1 .............................   Passed    8.19 sec
      Start 85: Pade
85/87 Test #85: Pade .............................   Passed    0.84 sec
      Start 86: SingleSiteBethe
86/87 Test #86: SingleSiteBethe ..................   Passed   27.10 sec
      Start 87: CDMFT_4_sites-v2
87/87 Test #87: CDMFT_4_sites-v2 .................   Passed   26.97 sec

on

Darwin xserve46.cpfs.mpg.de 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; 
root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64
parcollet commented 12 years ago

Isn't it just the fact that the Pade is unstable ? @mferrero : Michel : any idea of the origin of the issue ? @A32167 : do you confirm that the other tests failures are as #35 ?

aeantipov commented 12 years ago

@parcollet I do. For hdf5 issues I attached a diff to the #35 issue. Here is a diff of h5dumps of Pade.output.h5 for a clang compilation. https://gist.github.com/2405611 On gcc the outputs are the same.

parcollet commented 12 years ago

Ah yes I forgot to mention that my first report was using clang 3.1 too (on Ubuntu).

aeantipov commented 12 years ago

Here is the same diff with 15 floating points. https://gist.github.com/2405773 with the command

h5dump -m '%.15f' pytriqs/Base/test/Pade.output.h5
krivenko commented 12 years ago

Isn't it just the fact that the Pade is unstable ?

This is the question I'm asking. If the difference in results is stemming from, for example, different implementations of the standard library, I should either ignore it (set a higher tolerance in the test) or dig deeper into it and write a workaround or something.

Btw, may it be an issue of the different input data and not of Pade itself?

krivenko commented 12 years ago

I made commit 505f509f5b5b13d25b0b28926fd85fab246ba7db to my repository, with an updated version of the tests, which checks the input data for Pade as well (to make sure that the Matsubara Green's function is computed identically on different platforms).

parcollet commented 12 years ago

If this commit in on your fork, I guess you need to do a pull request so that we can merge it...

krivenko commented 12 years ago

Sorry for not answering for a long time. Now I have access to a working clang installation, so I can run the test by myself. I've decided not to create the pull request, because the proposed change seems conceptually wrong to me. One test must verify one feature, namely the newly introduced method, not the input data. So I'll keep the commit only in my repo for testing purposes and then revert it.

mferrero commented 12 years ago

One should indeed investigate the problem. Igor, could you figure if the problem stems because of small differences in the input or if the same inputs lead to different Pade on different machines?

krivenko commented 12 years ago

The input data seem to be the same, so it's really an issue with the Pade implementation behaving differently with gcc and clang.

parcollet commented 12 years ago

How can this code's result depend on compiler ? Is there a bug ?

krivenko commented 12 years ago

Ok, my conclusion is that the Pade algorithm is not quite stable. The discrepancy stems from tiny round-off noise amplified by the recursive algorithm, which calculates Pade coefficients. Different compilers produce this noise in different ways, so I can see two possible resolutions of the issue. The first is to simply raise the tolerance of the test. The second is to implement a more stable version of the algorithm if 1) it is feasible and 2) really worths doing.

parcollet commented 12 years ago

Makes sense. Is it the kind of pb that is quickly fixed by moving to high precision arithmetics ? I had similar pb with computing baths for DMRG calculations. The transformation star -> line for the bath is numerically unstable. Using GMP just solved it with moderate effort ... http://gmplib.org/ http://gmplib.org/manual/C_002b_002b-Class-Interface.html#C_002b_002b-Class-Interface

krivenko commented 12 years ago

Is it the kind of pb that is quickly fixed by moving to high precision arithmetics ?

I'll test it with long double instead of double and then it should become clear. Porting to GMP doesn't look like a tough task, but we will need to introduce a new dependency into TRIQS.

krivenko commented 12 years ago

Switching from double to long double decreased the discrepancy by 1-2 orders of magnitude. So I need a permission to pull in GMP.

parcollet commented 12 years ago

Ok, GMP is a standard GNU library, and quite useful in some cases. So I added in f0d0e90fc8893e32659ee3740188be387055301d the cmake detection for gmp lib and headers. Cf cmake/FindGMP.cmake, top comment to have the variables for your piece of code...

mferrero commented 12 years ago

OK, this is fixed with Igor's pull request #63. I close the ticket.