OE4T / meta-tegra-community

Repository for community-maintained recipes for additional packages for NVIDIA Jetson platforms
MIT License
16 stars 22 forks source link

pyCUDA updates #69

Closed ichergui closed 1 year ago

ichergui commented 1 year ago

1- Running demo.py sample application:

root@jetson-tx2-devkit:~# python3 ./demo.py 
original array:
[[-0.68391824 -0.3694507   0.47548345  2.1990209 ]
 [-2.1997297   0.65044016 -0.53435284 -0.46830413]
 [-0.5490767  -0.70024365 -0.47087818 -0.40060955]
 [ 1.5563756   0.6541615   0.8573306  -0.6914887 ]]
doubled with kernel:
[[-1.3678365  -0.7389014   0.9509669   4.3980417 ]
 [-4.3994594   1.3008803  -1.0687057  -0.93660825]
 [-1.0981534  -1.4004873  -0.94175637 -0.8012191 ]
 [ 3.1127512   1.308323    1.7146612  -1.3829774 ]]
doubled with InOut:
[[-1.3678365  -0.7389014   0.9509669   4.3980417 ]
 [-4.3994594   1.3008803  -1.0687057  -0.93660825]
 [-1.0981534  -1.4004873  -0.94175637 -0.8012191 ]
 [ 3.1127512   1.308323    1.7146612  -1.3829774 ]]
original array:
[[-0.28141204  0.5888601  -0.53642565 -2.0255206 ]
 [ 0.15535209  0.4751083  -1.1309123  -0.8362683 ]
 [-0.27051157 -0.20162384  0.48442098  0.23205017]
 [-0.31901076  1.1077355   0.7136627   1.0067804 ]]
doubled with gpuarray:
[[-0.5628241   1.1777202  -1.0728513  -4.051041  ]
 [ 0.31070417  0.9502166  -2.2618246  -1.6725366 ]
 [-0.54102314 -0.40324768  0.96884197  0.46410033]
 [-0.6380215   2.215471    1.4273254   2.0135608 ]]
root@jetson-tx2-devkit:~# 

2- Running demo_cdpSimplePrint.py sample application:

root@jetson-tx2-devkit:~# python3 ./demo_cdpSimplePrint.py 
starting Simple Print (CUDA Dynamic Parallelism)
***************************************************************************
The CPU launches 2 blocks of 2 threads each. On the device each thread will
launch 2 blocks of 2 threads each. The GPU we will do that recursively
until it reaches max_depth=2

In total 2
+8
=10 blocks are launched!!! (8 from the GPU)
***************************************************************************

Launching cdp_kernel() with CUDA Dynamic Parallelism:

BLOCK 1 launched by the host
BLOCK 0 launched by the host
|  BLOCK 3 launched by thread 0 of block 1
|  BLOCK 2 launched by thread 0 of block 0
|  BLOCK 5 launched by thread 0 of block 1
|  BLOCK 4 launched by thread 0 of block 0
|  BLOCK 6 launched by thread 1 of block 1
|  BLOCK 7 launched by thread 1 of block 1
|  BLOCK 9 launched by thread 1 of block 0
|  BLOCK 8 launched by thread 1 of block 0
root@jetson-tx2-devkit:~#