HiFiLES / HiFiLES-solver

High Fidelity Large Eddy Simulation Solver
Other
172 stars 131 forks source link

msh file after parmetis , boudnary face somtimes not found! #60

Closed fff969 closed 8 years ago

fff969 commented 9 years ago

I use ICEM to gen CGNS file and then use my won programe convertting the CGNS file to msh file. When I run HiFiLES (GPU version) on one processor everything is OK ( see Pic 1) but when I run HiFiLES with mpich ( mpiexec -n X HiFiLES input_file) for some cases the programe will stop when run at the Line 1511 in geometry.cpp file with the hint " All nodes of boudnary face belong to processor but could not find the Corresponding faces " (see Pic 2 and Pic 3). what wrong ?
Another question, When I use the mutil-GPU host , there only one GPU wroks even runing with the mpiexec command. Are there some solutions for the HFiLES run on different GPUs.

single mpi-2 mpi-2-1

mlopez14 commented 9 years ago

I suspect there is some issue with MPI. Either it is not installed properly in your machine or we have incompatibility issues with your specific version.

Can you run one of the testcases in multiple GPUs? Say the viscous cylinder case under testcases/navier-stokes/cylinder .

To run on multiple GPUs follow the instructions here: https://github.com/HiFiLES/HiFiLES-solver/wiki/Execution

Yes, we run HiFiLES all the time in many GPUs at a time.

fff969 commented 9 years ago

Dear mlopez, Thank you for your patience.

For the first problems I have mentioned, I try many cases for the parmetis problems. and found that .For hexa meshes everthing is OK. But for the tetra meshes. the error appear at most times. I thought there may be some bugs in the functions about msh files for tetra meshes. I don'tknow anything about GAMBIT so I can't test the "neu" files.

For the second questions. I have read the guideline which you list on your post carefully and still don't konw what happend to me . I have run HiFiLES using " mpiexec -n 4 HiFiLES input_file " but still only one GPU used ( please see pic.). By the way I use MPICH instead of the MPICH2 ,is this the reason ?

screenshot

JacobCrabill commented 9 years ago

We actually just realized there is a small bug in the code - the fix has been pushed to master now, so try the latest version. Basically, we had hard-coded the ID's in cudaSetDevice(device_id) for the machines we use in our lab here, and did not account for the possibility of others machines / clusters using the multi-GPU feature.

So what was happening in your case was that each process launched by mpirun was using the same CUDA device, because they didn't know better. The fix is for each process to call cudaSetDevice() and give it a unique CUDA device ID (separate from the other processes).

If you are still having issues, figure out the cuda device ID's of your GeForce cards and hard-code them into HiFiLES - the relevant line is line 111 of geometry.cpp in the current commit (https://github.com/HiFiLES/HiFiLES-solver/commit/08b53e1f9889878a353f7e148c32d75eb1c16d7c). Let us know if this fixes your problem, and a big thanks for catching this!

fff969 commented 9 years ago

Thank you very much Jacob . Your patch have already fixed this bug.

PS: I have some multi-GPUs hosts. Each of them has 4 nvidia graphic cards. When I run HiFiLES with 4 cards which means it run at a same hosts , the HiFiLES works very well with the latest code but when I run HiFiLES with 8 cards which means it will work on two different hosts , they can't work any more . see pic. it seems there are some bugs in " array.h".

Sincerely yours fan

4 GPUs on one Host work well!

0

the machinefile two hosts are same

1

but the two hosts can't work . I have test the mpi enviroment with my own compute program and SU2 ,both of them work well . the pic is not full display . I will push the error code later.

2

fff969 commented 9 years ago

I have already fixed this bug , I input the card number manually