OP-DSL / OP2-Common

OP2: open-source framework for the execution of unstructured grid applications on clusters of GPUs or multi-core CPUs
https://op-dsl.github.io
Other
98 stars 47 forks source link

Feature/group hexchange #202

Closed reguly closed 3 years ago

reguly commented 3 years ago

Required for Hydra GPU fixes

gihanmudalige commented 3 years ago

I am getting the following with the Fortran code-gen : Traceback (most recent call last): File "/rr-home/gihan/OP2-Common/translator/fortran/python/op2_fortran.py", line 1044, in op2_gen_cuda_color2(str(sys.argv[init_ctr]), date, consts, kernels, hydra,bookleaf) # does global coloring File "/rr-home/gihan/OP2-Common/translator/fortran/python/op2_gen_cuda_color2.py", line 658, in op2_gen_cuda_color2 funcs = util.replace_soa_subroutines(util.funlist2,0,soaflags,maps,accs,mapnames,1,hydra,bookleaf,[],atomics) File "/rr-home/gihan/OP2-Common/translator/fortran/python/util.py", line 135, in replace_soa_subroutines if len(stride)==0: TypeError: object of type 'int' has no len()

C/C++ tests are passing without issue

gihanmudalige commented 3 years ago

Ok that fixed the issue in codegen. However I am seen the following runtimes: =======================> Running Airfoil Fortran Plain DP built with PGI Compilers /rr-home/gihan/OP2-Common/apps/fortran/airfoil/airfoil_plain/dp

./airfoil_seq OP_MAPS_BASE_INDEX=0 Max total runtime = 559.7012250423431 seconds This test is considered PASSED

./airfoil_cuda OP_MAPS_BASE_INDEX=0 Max total runtime = 654.5445818901062 seconds This test is considered PASSED

./airfoil_openmp OP_MAPS_BASE_INDEX=0 Max total runtime = 10.82673501968384 seconds This test is considered PASSED

The CUDA time is very poor. Could it be my dummy group halo exchanges I did for non-mpi backends ? Sorry I am pinging problems without checking, but thought you might have a quick idea of whats happening.

gihanmudalige commented 3 years ago

ok that fixed the performance issue. I am going to run this over with our Hydra test and give the final approval !