ExpectationMax / opencurrent

OpenCurrent library for solving PDEs using CUDA (code.google.com/p/opencurrent)
Apache License 2.0
6 stars 2 forks source link

problem with 3D pressure solver #13

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. run pcg program
2. run cubicrayleigh with Rayleigh_run() 
3. run cubicrayleigh with Nusselt_run()

What is the expected output? What do you see instead?
None of them converged, and the while loop is dead
Message similar to:
[INFO] Sol_MultigridPressure3DBase::do_fmg - error after 3 iterations: L2 = 
0.000000 (221243.654747x), Linf = 0.000000 (-nanx)
There is a "nan", which likely lead to segmentation faults or dead loop 

What version of the product are you using? On what operating system?
Ubuntu 10.04LTS 64bit, CUDA 3.2, GTX460 Fermi

Please provide any additional information below.
I have noticed that two of the 18 unit tests failed due to segmentation faults 
and it should be related to 3D pressure solver. 
The unit test diagnostic info is as below.
******************************************************************
The following tests FAILED:
     12 - MultigridMixedTest (SEGFAULT)
     17 - PCGTest (SEGFAULT)
Errors while running CTest
make: *** [test] Error 8
******************************************************************
Thanks!

Original issue reported on code.google.com by yush...@gmail.com on 28 Nov 2010 at 5:03

GoogleCodeExporter commented 8 years ago
The "nan" is because it does not compute the Linf error, so it is undefined 
(essentially, it is computing 0/0 = nan).  the other errors seem real, though.

Can you run "utest PCGDoubleTest" and post the output?

Original comment by jcohen.p...@gmail.com on 28 Nov 2010 at 6:34

GoogleCodeExporter commented 8 years ago
To follow up

When checking the log file for unit tests, I found that "LockExTest", 
"NSTest","ProjectDoubleTest", and "MultigridDoubleTest" also have the similar 
error message "[INFO] Sol_MultigridPressure3DBase::do_fmg - error after 8 
iterations: L2 = 0.000000 (998410098814.984741x), Linf = 0.000000 (-nanx)", 
although they appeared to have passed the tests. 
Overall, the problem still points to the pressure solver. Does any one have any 
idea?
Thanks!

Original comment by yush...@gmail.com on 28 Nov 2010 at 6:41

GoogleCodeExporter commented 8 years ago
Thanks, Jonathan !

I guess I understand the error message now

I have run the two failed unit tests and the error messages are included in the 
attached file. 

Original comment by yush...@gmail.com on 28 Nov 2010 at 6:58

Attachments:

GoogleCodeExporter commented 8 years ago
To follow up
I think I found why MultigridMixedTest and PCGTest failed due to segmentation 
fault
It seems my GTX 460 with 768M RAM is not large enough to run test with mesh 
size of 256^3. Once I commented out this single test in mgmixedtest.cpp and 
pcgtest.cpp, both of them can pass. Thanks! 

Original comment by yush...@gmail.com on 8 Dec 2010 at 1:31

GoogleCodeExporter commented 8 years ago
Sorry, i meant to respond sooner.  Yes, the tests are designed for the Tesla 
cards, which have a minimum of 3GB.  So these 2 larger sized tests will fail on 
the 460.

I should make this an option in the build process, whether to enable these 
tests or not... probably something to add to the next release.

Original comment by jcohen.p...@gmail.com on 8 Dec 2010 at 1:42

GoogleCodeExporter commented 8 years ago
Thanks!
But interestingly, the MultigridDoubleTest passed the mesh size of 256^3 test. 
I think the memory usage of this case should be at least no less than 
MultigridMixTest. 
I will play with them on a C2050 later

Original comment by yush...@gmail.com on 8 Dec 2010 at 2:21

GoogleCodeExporter commented 8 years ago
No, the mixed version keeps both a single precision and double precision copy 
of all grids.  So it uses more memory.

Original comment by jcohen.p...@gmail.com on 8 Dec 2010 at 2:23

GoogleCodeExporter commented 8 years ago
Okay, I see. 
I should have read it more carefully. Thanks

Original comment by yush...@gmail.com on 8 Dec 2010 at 2:36