OE4T / meta-tegra-community

Repository for community-maintained recipes for additional packages for NVIDIA Jetson platforms
MIT License
16 stars 22 forks source link

pyCUDA updates #71

Closed ichergui closed 1 year ago

ichergui commented 1 year ago

1- Running demo.py sample application:

root@jetson-tx2-devkit:~# python3 ./demo.py 
original array:
[[ 0.44478756 -0.8229927   0.3115362  -0.39246017]
 [ 0.28666034 -0.8387675   0.1936967  -0.12686613]
 [ 1.2925358  -0.97299033 -1.7989218  -1.4376175 ]
 [-0.60669225  1.026853    1.5400996  -1.0121462 ]]
doubled with kernel:
[[ 0.8895751  -1.6459854   0.6230724  -0.78492033]
 [ 0.5733207  -1.677535    0.3873934  -0.25373226]
 [ 2.5850716  -1.9459807  -3.5978436  -2.875235  ]
 [-1.2133845   2.053706    3.0801992  -2.0242925 ]]
doubled with InOut:
[[ 0.8895751  -1.6459854   0.6230724  -0.78492033]
 [ 0.5733207  -1.677535    0.3873934  -0.25373226]
 [ 2.5850716  -1.9459807  -3.5978436  -2.875235  ]
 [-1.2133845   2.053706    3.0801992  -2.0242925 ]]
original array:
[[-0.9294008   1.0543453  -0.6830374  -0.08292789]
 [ 0.12469216  0.3783954  -0.77328384 -1.6378434 ]
 [-1.2414616  -0.05502772 -0.19391954 -1.67823   ]
 [ 0.5256272   0.5646808   0.4072747   1.063558  ]]
doubled with gpuarray:
[[-1.8588016   2.1086905  -1.3660748  -0.16585578]
 [ 0.24938433  0.7567908  -1.5465677  -3.2756867 ]
 [-2.4829233  -0.11005544 -0.38783908 -3.35646   ]
 [ 1.0512544   1.1293616   0.8145494   2.127116  ]]
root@jetson-tx2-devkit:~#

2- Running demo_cdpSimplePrint.py sample application:

root@jetson-tx2-devkit:~# python3 ./demo_cdpSimplePrint.py 
starting Simple Print (CUDA Dynamic Parallelism)
***************************************************************************
The CPU launches 2 blocks of 2 threads each. On the device each thread will
launch 2 blocks of 2 threads each. The GPU we will do that recursively
until it reaches max_depth=2

In total 2
+8
=10 blocks are launched!!! (8 from the GPU)
***************************************************************************

Launching cdp_kernel() with CUDA Dynamic Parallelism:

BLOCK 0 launched by the host
BLOCK 1 launched by the host
|  BLOCK 3 launched by thread 0 of block 1
|  BLOCK 2 launched by thread 0 of block 0
|  BLOCK 5 launched by thread 0 of block 1
|  BLOCK 4 launched by thread 0 of block 0
|  BLOCK 6 launched by thread 1 of block 1
|  BLOCK 7 launched by thread 1 of block 0
|  BLOCK 9 launched by thread 1 of block 0
|  BLOCK 8 launched by thread 1 of block 1
root@jetson-tx2-devkit:~#