Closed dzhoshkun closed 6 years ago
MAX_DEPTH=3, INIT_SUBDIV=1
:[0] % nvprof ./mandelbrot
==6343== NVPROF is profiling process 6343, command: ./mandelbrot
Mandelbrot set computed in 0.208 s, at 1287.527 Mpix/s
Copying took 0.467 s
==6343== Profiling application: ./mandelbrot
==6343== Profiling result:
Type Time(%) Time Calls (host) Calls (device) Avg Min Max Name
GPU activities: 69.08% 466.23ms 1 - 466.23ms 466.23ms 466.23ms [CUDA memcpy DtoH]
30.63% 206.69ms 0 4 51.673ms 31.036ms 72.328ms mandelbrot_pixel_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int)
0.29% 1.9665ms 1 1 983.25us 295.84us 1.6707ms mandelbrot_block_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int, int)
API calls: 48.75% 466.99ms 1 - 466.99ms 466.99ms 466.99ms cudaMemcpy
29.29% 280.57ms 1 - 280.57ms 280.57ms 280.57ms cudaMalloc
21.71% 207.95ms 1 - 207.95ms 207.95ms 207.95ms cudaThreadSynchronize
0.08% 809.10us 1 - 809.10us 809.10us 809.10us cudaFree
0.06% 615.37us 94 - 6.5460us 768ns 244.02us cuDeviceGetAttribute
0.05% 517.24us 1 - 517.24us 517.24us 517.24us cudaLaunch
0.04% 377.21us 1 - 377.21us 377.21us 377.21us cuDeviceTotalMem
0.01% 67.536us 1 - 67.536us 67.536us 67.536us cuDeviceGetName
0.00% 21.162us 3 - 7.0540us 5.7270us 8.8700us cudaEventRecord
0.00% 8.2410us 9 - 915ns 628ns 2.8630us cudaSetupArgument
0.00% 7.3350us 3 - 2.4450us 1.0480us 5.0990us cudaEventCreate
0.00% 4.6100us 3 - 1.5360us 978ns 2.5840us cudaEventDestroy
0.00% 4.3310us 2 - 2.1650us 1.4670us 2.8640us cudaEventElapsedTime
0.00% 4.1910us 3 - 1.3970us 839ns 2.3040us cuDeviceGetCount
0.00% 3.6320us 1 - 3.6320us 3.6320us 3.6320us cudaEventSynchronize
0.00% 2.4440us 2 - 1.2220us 768ns 1.6760us cuDeviceGet
0.00% 1.3270us 1 - 1.3270us 1.3270us 1.3270us cudaConfigureCall
MAX_DEPTH=6, INIT_SUBDIV=1
:[0] % nvprof ./mandelbrot
==6526== NVPROF is profiling process 6526, command: ./mandelbrot
Mandelbrot set computed in 0.101 s, at 2646.876 Mpix/s
Copying took 0.492 s
==6526== Profiling application: ./mandelbrot
==6526== Profiling result:
Type Time(%) Time Calls (host) Calls (device) Avg Min Max Name
GPU activities: 67.27% 491.25ms 1 - 491.25ms 491.25ms 491.25ms [CUDA memcpy DtoH]
15.06% 109.94ms 0 170 646.71us 67.329us 2.6077ms mandelbrot_pixel_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int)
9.24% 67.506ms 0 38 1.7765ms 37.408us 37.502ms iter_fill_k(int*, int, int, int, int, int)
8.43% 61.523ms 1 69 878.90us 63.360us 4.5692ms mandelbrot_block_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int, int)
API calls: 53.82% 492.01ms 1 - 492.01ms 492.01ms 492.01ms cudaMemcpy
34.85% 318.62ms 1 - 318.62ms 318.62ms 318.62ms cudaMalloc
10.99% 100.49ms 1 - 100.49ms 100.49ms 100.49ms cudaThreadSynchronize
0.10% 932.93us 94 - 9.9240us 768ns 408.50us cuDeviceGetAttribute
0.10% 890.12us 1 - 890.12us 890.12us 890.12us cudaLaunch
0.08% 757.98us 1 - 757.98us 757.98us 757.98us cudaFree
0.04% 387.06us 1 - 387.06us 387.06us 387.06us cuDeviceTotalMem
0.01% 93.307us 1 - 93.307us 93.307us 93.307us cuDeviceGetName
0.00% 26.819us 3 - 8.9390us 7.8220us 10.267us cudaEventRecord
0.00% 11.175us 3 - 3.7250us 1.7460us 7.4730us cudaEventCreate
0.00% 11.174us 9 - 1.2410us 699ns 4.3300us cudaSetupArgument
0.00% 4.5400us 3 - 1.5130us 838ns 2.5840us cuDeviceGetCount
0.00% 4.2610us 3 - 1.4200us 978ns 2.3050us cudaEventDestroy
0.00% 4.1910us 2 - 2.0950us 1.3970us 2.7940us cudaEventElapsedTime
0.00% 3.4920us 1 - 3.4920us 3.4920us 3.4920us cudaEventSynchronize
0.00% 3.2130us 2 - 1.6060us 908ns 2.3050us cuDeviceGet
0.00% 1.9560us 1 - 1.9560us 1.9560us 1.9560us cudaConfigureCall
MAX_DEPTH=12, INIT_SUBDIV=1
:[0] % nvprof ./mandelbrot
==6691== NVPROF is profiling process 6691, command: ./mandelbrot
Mandelbrot set computed in 0.071 s, at 3789.035 Mpix/s
Copying took 0.501 s
==6691== Profiling application: ./mandelbrot
==6691== Profiling result:
Type Time(%) Time Calls (host) Calls (device) Avg Min Max Name
GPU activities: 37.52% 793.39ms 1 7732 102.60us 41.888us 2.1085ms mandelbrot_block_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int, int)
27.90% 589.93ms 0 15512 38.030us 5.8880us 281.25us mandelbrot_pixel_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int)
23.64% 499.86ms 1 - 499.86ms 499.86ms 499.86ms [CUDA memcpy DtoH]
10.93% 231.15ms 0 7685 30.077us 4.8960us 64.651ms iter_fill_k(int*, int, int, int, int, int)
API calls: 56.10% 500.63ms 1 - 500.63ms 500.63ms 500.63ms cudaMemcpy
35.70% 318.62ms 1 - 318.62ms 318.62ms 318.62ms cudaMalloc
7.84% 69.928ms 1 - 69.928ms 69.928ms 69.928ms cudaThreadSynchronize
0.11% 1.0191ms 94 - 10.841us 768ns 459.48us cuDeviceGetAttribute
0.10% 886.70us 1 - 886.70us 886.70us 886.70us cudaLaunch
0.09% 764.97us 1 - 764.97us 764.97us 764.97us cudaFree
0.05% 411.15us 1 - 411.15us 411.15us 411.15us cuDeviceTotalMem
0.01% 79.549us 1 - 79.549us 79.549us 79.549us cuDeviceGetName
0.00% 25.353us 3 - 8.4510us 7.1940us 9.7780us cudaEventRecord
0.00% 12.151us 3 - 4.0500us 1.6760us 8.4500us cudaEventCreate
0.00% 10.754us 9 - 1.1940us 698ns 3.9110us cudaSetupArgument
0.00% 4.9590us 3 - 1.6530us 978ns 2.8640us cudaEventDestroy
0.00% 4.4700us 2 - 2.2350us 1.4670us 3.0030us cudaEventElapsedTime
0.00% 4.0500us 3 - 1.3500us 768ns 2.2340us cuDeviceGetCount
0.00% 3.4920us 1 - 3.4920us 3.4920us 3.4920us cudaEventSynchronize
0.00% 3.2810us 2 - 1.6400us 907ns 2.3740us cuDeviceGet
0.00% 1.8850us 1 - 1.8850us 1.8850us 1.8850us cudaConfigureCall
MAX_DEPTH=3, INIT_SUBDIV=2
:[0] % nvprof ./mandelbrot
==6846== NVPROF is profiling process 6846, command: ./mandelbrot
Mandelbrot set computed in 0.166 s, at 1618.145 Mpix/s
Copying took 0.468 s
==6846== Profiling application: ./mandelbrot
==6846== Profiling result:
Type Time(%) Time Calls (host) Calls (device) Avg Min Max Name
GPU activities: 72.36% 467.39ms 1 - 467.39ms 467.39ms 467.39ms [CUDA memcpy DtoH]
25.39% 163.97ms 0 14 11.712ms 997.28us 25.495ms mandelbrot_pixel_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int)
2.10% 13.586ms 1 4 2.7171ms 1.5727ms 3.3637ms mandelbrot_block_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int, int)
0.15% 947.71us 0 2 473.86us 461.12us 486.59us iter_fill_k(int*, int, int, int, int, int)
API calls: 48.12% 468.15ms 1 - 468.15ms 468.15ms 468.15ms cudaMemcpy
34.47% 335.28ms 1 - 335.28ms 335.28ms 335.28ms cudaMalloc
16.99% 165.31ms 1 - 165.31ms 165.31ms 165.31ms cudaThreadSynchronize
0.20% 1.9888ms 1 - 1.9888ms 1.9888ms 1.9888ms cudaFree
0.10% 953.33us 94 - 10.141us 769ns 407.38us cuDeviceGetAttribute
0.06% 559.77us 1 - 559.77us 559.77us 559.77us cudaLaunch
0.04% 412.13us 1 - 412.13us 412.13us 412.13us cuDeviceTotalMem
0.01% 79.688us 1 - 79.688us 79.688us 79.688us cuDeviceGetName
0.00% 18.439us 3 - 6.1460us 4.6790us 7.7530us cudaEventRecord
0.00% 8.5190us 9 - 946ns 628ns 3.0030us cudaSetupArgument
0.00% 7.8920us 3 - 2.6300us 1.1180us 5.5170us cudaEventCreate
0.00% 4.8200us 3 - 1.6060us 769ns 3.0730us cuDeviceGetCount
0.00% 4.6100us 2 - 2.3050us 1.4670us 3.1430us cudaEventElapsedTime
0.00% 4.4690us 3 - 1.4890us 977ns 2.4440us cudaEventDestroy
0.00% 3.4230us 1 - 3.4230us 3.4230us 3.4230us cudaEventSynchronize
0.00% 2.6540us 2 - 1.3270us 908ns 1.7460us cuDeviceGet
0.00% 1.4670us 1 - 1.4670us 1.4670us 1.4670us cudaConfigureCall
MAX_DEPTH=3, INIT_SUBDIV=4
:[0] % nvprof ./mandelbrot
==7068== NVPROF is profiling process 7068, command: ./mandelbrot
Mandelbrot set computed in 0.130 s, at 2061.541 Mpix/s
Copying took 0.466 s
==7068== Profiling application: ./mandelbrot
==7068== Profiling result:
Type Time(%) Time Calls (host) Calls (device) Avg Min Max Name
GPU activities: 62.57% 464.79ms 1 - 464.79ms 464.79ms 464.79ms [CUDA memcpy DtoH]
17.94% 133.26ms 0 50 2.6652ms 252.42us 8.5921ms mandelbrot_pixel_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int)
16.77% 124.60ms 0 8 15.575ms 123.01us 123.31ms iter_fill_k(int*, int, int, int, int, int)
2.71% 20.147ms 1 14 1.3431ms 102.21us 3.5775ms mandelbrot_block_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int, int)
API calls: 51.61% 465.55ms 1 - 465.55ms 465.55ms 465.55ms cudaMemcpy
33.71% 304.07ms 1 - 304.07ms 304.07ms 304.07ms cudaMalloc
14.37% 129.67ms 1 - 129.67ms 129.67ms 129.67ms cudaThreadSynchronize
0.11% 969.59us 94 - 10.314us 768ns 411.29us cuDeviceGetAttribute
0.08% 745.55us 1 - 745.55us 745.55us 745.55us cudaFree
0.06% 514.59us 1 - 514.59us 514.59us 514.59us cudaLaunch
0.05% 431.41us 1 - 431.41us 431.41us 431.41us cuDeviceTotalMem
0.01% 78.850us 1 - 78.850us 78.850us 78.850us cuDeviceGetName
0.00% 20.044us 3 - 6.6810us 5.5170us 8.5200us cudaEventRecord
0.00% 8.4510us 9 - 939ns 628ns 3.0730us cudaSetupArgument
0.00% 7.4040us 3 - 2.4680us 1.0480us 5.1680us cudaEventCreate
0.00% 4.5390us 2 - 2.2690us 1.4660us 3.0730us cudaEventElapsedTime
0.00% 4.2600us 3 - 1.4200us 977ns 2.3050us cudaEventDestroy
0.00% 4.1200us 3 - 1.3730us 838ns 2.0950us cuDeviceGetCount
0.00% 3.6310us 1 - 3.6310us 3.6310us 3.6310us cudaEventSynchronize
0.00% 2.5140us 2 - 1.2570us 908ns 1.6060us cuDeviceGet
0.00% 1.3970us 1 - 1.3970us 1.3970us 1.3970us cudaConfigureCall
MAX_DEPTH=6, INIT_SUBDIV=1
:[0] % nvprof ./mandelbrot
==7252== NVPROF is profiling process 7252, command: ./mandelbrot
Mandelbrot set computed in 0.102 s, at 2620.000 Mpix/s
Copying took 0.466 s
==7252== Profiling application: ./mandelbrot
==7252== Profiling result:
Type Time(%) Time Calls (host) Calls (device) Avg Min Max Name
GPU activities: 65.31% 464.80ms 1 - 464.80ms 464.80ms 464.80ms [CUDA memcpy DtoH]
15.88% 113.02ms 0 170 664.82us 66.816us 2.3279ms mandelbrot_pixel_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int)
11.00% 78.260ms 0 38 2.0595ms 39.584us 44.788ms iter_fill_k(int*, int, int, int, int, int)
7.81% 55.551ms 1 69 793.58us 70.080us 2.8471ms mandelbrot_block_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int, int)
API calls: 54.04% 465.56ms 1 - 465.56ms 465.56ms 465.56ms cudaMemcpy
33.82% 291.32ms 1 - 291.32ms 291.32ms 291.32ms cudaMalloc
11.83% 101.92ms 1 - 101.92ms 101.92ms 101.92ms cudaThreadSynchronize
0.11% 925.46us 94 - 9.8450us 768ns 404.10us cuDeviceGetAttribute
0.08% 729.42us 1 - 729.42us 729.42us 729.42us cudaFree
0.06% 516.12us 1 - 516.12us 516.12us 516.12us cudaLaunch
0.05% 408.43us 1 - 408.43us 408.43us 408.43us cuDeviceTotalMem
0.01% 75.638us 1 - 75.638us 75.638us 75.638us cuDeviceGetName
0.00% 19.484us 3 - 6.4940us 5.3770us 8.6600us cudaEventRecord
0.00% 7.8930us 9 - 877ns 628ns 2.5840us cudaSetupArgument
0.00% 7.3340us 3 - 2.4440us 1.0480us 5.1680us cudaEventCreate
0.00% 4.6800us 3 - 1.5600us 978ns 2.6540us cudaEventDestroy
0.00% 4.3310us 2 - 2.1650us 1.4670us 2.8640us cudaEventElapsedTime
0.00% 4.0510us 3 - 1.3500us 838ns 2.2350us cuDeviceGetCount
0.00% 3.6320us 1 - 3.6320us 3.6320us 3.6320us cudaEventSynchronize
0.00% 3.0730us 2 - 1.5360us 908ns 2.1650us cuDeviceGet
0.00% 1.3970us 1 - 1.3970us 1.3970us 1.3970us cudaConfigureCall
MAX_DEPTH=6, INIT_SUBDIV=2
:[0] % nvprof ./mandelbrot
==7442== NVPROF is profiling process 7442, command: ./mandelbrot
Mandelbrot set computed in 0.076 s, at 3546.677 Mpix/s
Copying took 0.469 s
==7442== Profiling application: ./mandelbrot
==7442== Profiling result:
Type Time(%) Time Calls (host) Calls (device) Avg Min Max Name
GPU activities: 57.77% 468.55ms 1 - 468.55ms 468.55ms 468.55ms [CUDA memcpy DtoH]
16.07% 130.32ms 0 164 794.65us 14.529us 69.170ms iter_fill_k(int*, int, int, int, int, int)
14.21% 115.30ms 0 554 208.12us 23.840us 1.1083ms mandelbrot_pixel_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int)
11.95% 96.952ms 1 238 405.66us 49.536us 2.9909ms mandelbrot_block_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int, int)
API calls: 55.31% 469.30ms 1 - 469.30ms 469.30ms 469.30ms cudaMemcpy
35.50% 301.26ms 1 - 301.26ms 301.26ms 301.26ms cudaMalloc
8.86% 75.144ms 1 - 75.144ms 75.144ms 75.144ms cudaThreadSynchronize
0.12% 1.0049ms 94 - 10.690us 838ns 435.04us cuDeviceGetAttribute
0.09% 750.09us 1 - 750.09us 750.09us 750.09us cudaFree
0.06% 520.10us 1 - 520.10us 520.10us 520.10us cudaLaunch
0.05% 409.06us 1 - 409.06us 409.06us 409.06us cuDeviceTotalMem
0.01% 79.130us 1 - 79.130us 79.130us 79.130us cuDeviceGetName
0.00% 19.345us 3 - 6.4480us 4.4690us 9.3590us cudaEventRecord
0.00% 7.9610us 9 - 884ns 628ns 2.6540us cudaSetupArgument
0.00% 7.3330us 3 - 2.4440us 1.0470us 5.0290us cudaEventCreate
0.00% 4.8190us 3 - 1.6060us 977ns 2.7940us cudaEventDestroy
0.00% 4.1210us 3 - 1.3730us 769ns 2.0950us cuDeviceGetCount
0.00% 4.1210us 2 - 2.0600us 1.4670us 2.6540us cudaEventElapsedTime
0.00% 3.4220us 1 - 3.4220us 3.4220us 3.4220us cudaEventSynchronize
0.00% 3.2130us 2 - 1.6060us 838ns 2.3750us cuDeviceGet
0.00% 1.3970us 1 - 1.3970us 1.3970us 1.3970us cudaConfigureCall
MAX_DEPTH=12, INIT_SUBDIV=4
:[0] % nvprof ./mandelbrot
==7683== NVPROF is profiling process 7683, command: ./mandelbrot
Mandelbrot set computed in 0.069 s, at 3882.051 Mpix/s
Copying took 0.470 s
==7683== Profiling application: ./mandelbrot
==7683== Profiling result:
Type Time(%) Time Calls (host) Calls (device) Avg Min Max Name
GPU activities: 38.04% 798.44ms 1 7727 103.32us 43.328us 2.1229ms mandelbrot_block_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int, int)
27.72% 581.85ms 0 15512 37.509us 5.5360us 336.67us mandelbrot_pixel_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int)
22.34% 468.90ms 1 - 468.90ms 468.90ms 468.90ms [CUDA memcpy DtoH]
11.91% 249.91ms 0 7685 32.519us 4.5120us 65.005ms iter_fill_k(int*, int, int, int, int, int)
API calls: 56.18% 469.65ms 1 - 469.65ms 469.65ms 469.65ms cudaMemcpy
35.30% 295.15ms 1 - 295.15ms 295.15ms 295.15ms cudaMalloc
8.21% 68.610ms 1 - 68.610ms 68.610ms 68.610ms cudaThreadSynchronize
0.11% 886.21us 94 - 9.4270us 698ns 383.64us cuDeviceGetAttribute
0.09% 749.88us 1 - 749.88us 749.88us 749.88us cudaFree
0.06% 514.38us 1 - 514.38us 514.38us 514.38us cudaLaunch
0.04% 328.88us 1 - 328.88us 328.88us 328.88us cuDeviceTotalMem
0.01% 71.308us 1 - 71.308us 71.308us 71.308us cuDeviceGetName
0.00% 19.207us 3 - 6.4020us 4.4700us 8.8700us cudaEventRecord
0.00% 8.0990us 9 - 899ns 559ns 2.8630us cudaSetupArgument
0.00% 7.8230us 3 - 2.6070us 1.0480us 5.5180us cudaEventCreate
0.00% 5.0280us 2 - 2.5140us 1.4670us 3.5610us cudaEventElapsedTime
0.00% 4.8900us 3 - 1.6300us 1.0480us 2.7940us cudaEventDestroy
0.00% 3.6300us 3 - 1.2100us 838ns 1.8850us cuDeviceGetCount
0.00% 3.5620us 1 - 3.5620us 3.5620us 3.5620us cudaEventSynchronize
0.00% 2.7250us 2 - 1.3620us 839ns 1.8860us cuDeviceGet
0.00% 1.6760us 1 - 1.6760us 1.6760us 1.6760us cudaConfigureCall
Note: export CUDA_VISIBLE_DEVICES=0
used on Adelie
MAX_DEPTH=12, INIT_SUBDIV=2
:[0] % nvprof ./mandelbrot
==8481== NVPROF is profiling process 8481, command: ./mandelbrot
Mandelbrot set computed in 0.070 s, at 3837.490 Mpix/s
Copying took 0.425 s
==8481== Profiling application: ./mandelbrot
==8481== Profiling result:
Type Time(%) Time Calls (host) Calls (device) Avg Min Max Name
GPU activities: 39.49% 804.79ms 1 7731 104.09us 44.416us 2.1801ms mandelbrot_block_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int, int)
28.60% 582.86ms 0 15512 37.574us 5.5680us 750.53us mandelbrot_pixel_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int)
20.81% 424.18ms 1 - 424.18ms 424.18ms 424.18ms [CUDA memcpy DtoH]
11.09% 226.02ms 0 7685 29.410us 4.8960us 64.438ms iter_fill_k(int*, int, int, int, int, int)
API calls: 54.43% 424.88ms 1 - 424.88ms 424.88ms 424.88ms cudaMemcpy
36.32% 283.55ms 1 - 283.55ms 283.55ms 283.55ms cudaMalloc
8.90% 69.457ms 1 - 69.457ms 69.457ms 69.457ms cudaThreadSynchronize
0.12% 916.87us 94 - 9.7530us 768ns 402.21us cuDeviceGetAttribute
0.10% 806.38us 1 - 806.38us 806.38us 806.38us cudaFree
0.06% 469.26us 1 - 469.26us 469.26us 469.26us cudaLaunch
0.05% 388.73us 1 - 388.73us 388.73us 388.73us cuDeviceTotalMem
0.01% 73.682us 1 - 73.682us 73.682us 73.682us cuDeviceGetName
0.00% 21.930us 3 - 7.3100us 5.4480us 8.9390us cudaEventRecord
0.00% 7.7510us 9 - 861ns 628ns 2.3750us cudaSetupArgument
0.00% 6.6350us 3 - 2.2110us 978ns 4.4700us cudaEventCreate
0.00% 5.1680us 3 - 1.7220us 1.1170us 2.7240us cudaEventDestroy
0.00% 4.1210us 2 - 2.0600us 1.3970us 2.7240us cudaEventElapsedTime
0.00% 3.8410us 3 - 1.2800us 768ns 2.0950us cuDeviceGetCount
0.00% 3.7710us 1 - 3.7710us 3.7710us 3.7710us cudaEventSynchronize
0.00% 2.2350us 2 - 1.1170us 838ns 1.3970us cuDeviceGet
0.00% 1.3270us 1 - 1.3270us 1.3270us 1.3270us cudaConfigureCall
MAX_DEPTH=6, INIT_SUBDIV=4
:[0] % nvprof ./mandelbrot
==8690== NVPROF is profiling process 8690, command: ./mandelbrot
Mandelbrot set computed in 0.059 s, at 4514.164 Mpix/s
Copying took 0.412 s
==8690== Profiling application: ./mandelbrot
==8690== Profiling result:
Type Time(%) Time Calls (host) Calls (device) Avg Min Max Name
GPU activities: 52.28% 411.01ms 1 - 411.01ms 411.01ms 411.01ms [CUDA memcpy DtoH]
20.26% 159.30ms 1 788 201.90us 46.240us 2.5210ms mandelbrot_block_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int, int)
17.90% 140.75ms 0 1736 81.078us 11.392us 404.58us mandelbrot_pixel_k(int*, int, int, thrust::complex<float>, thrust::complex<float>, int, int, int)
9.55% 75.050ms 0 644 116.54us 9.0560us 6.7421ms iter_fill_k(int*, int, int, int, int, int)
API calls: 56.62% 411.64ms 1 - 411.64ms 411.64ms 411.64ms cudaMemcpy
34.96% 254.17ms 1 - 254.17ms 254.17ms 254.17ms cudaMalloc
8.11% 58.931ms 1 - 58.931ms 58.931ms 58.931ms cudaThreadSynchronize
0.10% 756.17us 1 - 756.17us 756.17us 756.17us cudaFree
0.10% 744.85us 94 - 7.9230us 628ns 325.88us cuDeviceGetAttribute
0.07% 509.14us 1 - 509.14us 509.14us 509.14us cudaLaunch
0.03% 184.03us 1 - 184.03us 184.03us 184.03us cuDeviceTotalMem
0.01% 52.870us 1 - 52.870us 52.870us 52.870us cuDeviceGetName
0.00% 20.602us 3 - 6.8670us 6.1460us 7.9610us cudaEventRecord
0.00% 8.4500us 9 - 938ns 628ns 2.7940us cudaSetupArgument
0.00% 7.6120us 3 - 2.5370us 1.0470us 5.3780us cudaEventCreate
0.00% 4.1210us 3 - 1.3730us 908ns 2.3050us cudaEventDestroy
0.00% 3.8410us 2 - 1.9200us 1.3270us 2.5140us cudaEventElapsedTime
0.00% 3.1430us 1 - 3.1430us 3.1430us 3.1430us cudaEventSynchronize
0.00% 2.7250us 3 - 908ns 629ns 1.3270us cuDeviceGetCount
0.00% 1.8860us 2 - 943ns 698ns 1.1880us cuDeviceGet
0.00% 1.3970us 1 - 1.3970us 1.3970us 1.3970us cudaConfigureCall
MAX_DEPTH
increases parallelism, as the deeper the recursion goes, the more the threads that get launched (by the leader of each block).
INIT_SUBDIV
increases parallelism, the larger its value the more thread blocks get launched with smaller parts of the problem to compute.
However the story seems to change from MAX_DEPTH=6
to MAX_DEPTH=12
with INIT_SUBDIV=4
MAX_DEPTH
andINIT_SUBDIV
in the mandelbrot exercise