Closed davidherreroperez closed 5 years ago
Looks like insufficient memory on the GPU (although p6000 should have 24gb, which is more than enough).
Try running with -n 32
(the default n is 128; the problem size is n^3
).
Sorry, did not catch the part about block values. I'll try to reproduce that when I get to my workstation, but the default problem is Poisson, which does not have block structure.
Sorry, did not catch the part about block values. I'll try to reproduce that when I get to my workstation, but the default problem is Poisson, which does not have block structure.
I have tested the problem using the problem discussed in
https://github.com/ddemidov/amgcl/issues/114
with the files
https://www.dropbox.com/sh/mf38r0mew9eloli/AACumScoRpVqjo1VDks98wgIa?dl=0
persisting the problem as follows
OK using CPU with block structure
./mpi_amg -A ordered_by_dim/A_37442.mtx -f ordered_by_dim/b_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.relax.type=ilu0 -b2
World size: 1 Type: CG Unknowns: 18721 Memory footprint: 1.14 M
Number of levels: 2 Operator complexity: 1.06 Grid complexity: 1.07
level unknowns nonzeros 0 18721 166753 (94.03%) [1] 1 1225 10585 ( 5.97%) [1]
Iterations: 77 Error: 7.52487e-09
[Profile: 0.759 s] (100.00%) [ self: 0.034 s] ( 4.53%) [ read: 0.543 s] ( 71.59%) [ setup: 0.025 s] ( 3.27%) [ solve: 0.156 s] ( 20.61%)
OK using vexcl with OpenCL backend
$ ./mpi_amg_vexcl_cl -A ordered_by_dim/A_37442.mtx -f ordered_by_dim/b_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.relax.type=ilu0 precond.coarsening.aggr.block_size=2
World size: 1
Type: CG Unknowns: 37442 Memory footprint: 1.14 M
Number of levels: 2 Operator complexity: 1.06 Grid complexity: 1.07
level unknowns nonzeros 0 37442 667012 (94.03%) [1] 1 2450 42340 ( 5.97%) [1]
Iterations: 52 Error: 6.99069e-09
[Profile: 0.738 s] (100.00%) [ self: 0.117 s] ( 15.91%) [ read: 0.509 s] ( 68.92%) [ setup: 0.058 s] ( 7.87%) [ solve: 0.054 s] ( 7.31%)
Fail using block structure with vexcl with OpenCL backend
$ ./mpi_amg_vexcl_cl -A ordered_by_dim/A_37442.mtx -f ordered_by_dim/b_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.relax.type=ilu0 -b2
World size: 1
terminate called after throwing an instance of 'cl::Error' what(): clGetMemObjectInfo Aborted (core dumped)
Does it work with solver_vexcl_cl
(non-mpi)?
Ok, I can reproduce the problem with mpi_amg_vexcl_cl
. Will look into it.
Should be fixed now, thank you for reporting!
There are related commits in vexcl (ddemidov/vexcl@5778e23f8b00b3151031ee28fec83972b01a6d11 and ddemidov/vexcl@9b24cacb28c39cd3412211b44c98108f800fec8c), and also here (df925065295893952a2b64f89e46f47cb4d5726d), but these are only required for debug builds.
The main fix comes from 7b95a8e0d11f52308dfd3d490dd47f7166f0ee88.
I did not try this with mpi_amg_vexcl_cuda
(do not have a useable cuda configuration at the moment); please see if the fix also works there.
EDIT: mpi_amg_vexcl_cuda
works for me as well.
Hi Denis, thanks for your prompt response. Both versions (vexcl_cl and vexcl_cuda) are also working for me!
Hi Denis,
I'm running the example located at examples/mpi/mpi_amg.cpp and I have the following problem using the GPGPU compilation for OpenCL using VEXCL as follows
$ ./mpi_amg_vexcl_cl -b2 World size: 1
terminate called after throwing an instance of 'cl::Error' what(): clGetMemObjectInfo Aborted (core dumped)
The problem also occurs for the CUDA compilation using vexcl
$ ./mpi_amg_vexcl_cuda -b2 World size: 1
Type: BiCGStab Unknowns: 1048576 Memory footprint: 112.00 M
Number of levels: 4 Operator complexity: 2.65 Grid complexity: 1.20
level unknowns nonzeros 0 1048576 3645440 (37.71%) [1] 1 176128 4592300 (47.50%) [1] 2 30518 1355860 (14.03%) [1] 3 1273 73445 ( 0.76%) [1]
./mpi_amg_vexcl_cuda(_ZN3vex6detail15print_backtraceEv+0x32) [0x55e7e7dae5c2] ./mpi_amg_vexcl_cuda(+0xc6c24) [0x55e7e7d90c24] ./mpi_amg_vexcl_cuda(_ZNK3vex8ReductorIdNS_9SUM_KahanEEclINS_17vector_expressionIN5boost5proto7exprns_10basic_exprINS6_6tagns_3tag10multipliesENS6_7argsns_5list2IRKNS_6vectorIdEESH_EELl2EEEEEEENSt9enable_ifIXsrNS6_7matchesIT_NS_19vector_exprgrammarEEE5valueEdE4typeERKSN+0x53d) [0x55e7e7e0aadd] ./mpi_amg_vexcl_cuda(_ZNK5amgcl3mpi13inner_productclIN3vex6vectorINS_13static_matrixIdLi2ELi1EEEEES7_EENS_4math18inner_product_implINS_7backend10value_typeIT_vE4typeEvE11return_typeERKSCRKT0+0x86) [0x55e7e7e0b9e6] ./mpi_amg_vexcl_cuda(_ZNK5amgcl6solver8bicgstabINS_7backend5vexclINS_13static_matrixIdLi2ELi2EEENS0_16vexcl_skyline_luIS5_EEEENS_3mpi13inner_productEEclINS9_18distributed_matrixIS8_EENS_7runtime3mpi14preconditionerIS8_EEN3vex6vectorINS4_IdLi2ELi1EEEEERSM_EESt5tupleIJmdEERKT_RKT0_RKT1OT2+0x206) [0x55e7e7e834f6] ./mpi_amg_vexcl_cuda(_Z11solve_blockILi2EEvN5amgcl3mpi12communicatorElRKSt6vectorIlSaIlEES7_RKS3_IdSaIdEERKN5boost13property_tree11basic_ptreeINSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEESK_St4lessISK_EEESB_NS0_7runtime3mpi9partition4typeE+0x11f5) [0x55e7e7f12cb5] ./mpi_amg_vexcl_cuda(main+0x14d0) [0x55e7e7d8ace0] /lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xe7) [0x7fddfed2bb97] ./mpi_amg_vexcl_cuda(_start+0x2a) [0x55e7e7d8b77a]
terminate called after throwing an instance of 'vex::backend::cuda::error' what(): /home/dherrero/bin/vexcl/vexcl/vexcl/backend/cuda/device_vector.hpp:142 CUDA Driver API Error (Unknown error 700) Aborted (core dumped)
The examples are running properly without block values. I will appreciate that you indicate me if I am doing wrong or if you also have such a problem.