Open hlinander opened 1 year ago
heal_swin.run_configs.segmentation.swin_hp_synwoodscape_large_plus_AD_train_run_config
[Profile config]
{
"batch_size": 1,
"n_warmup": 10,
"n_iter": 200
}
[Profile results]
----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::mm 2.80% 690.157ms 3.06% 754.154ms 38.477us 4.645s 15.37% 4.645s 237.013us 19600
aten::native_layer_norm 1.43% 352.230ms 1.94% 478.064ms 45.100us 3.711s 12.28% 3.711s 350.135us 10600
aten::_fused_dropout 1.98% 488.012ms 2.76% 679.476ms 38.173us 3.257s 10.78% 3.257s 182.970us 17800
aten::add_ 1.41% 348.358ms 1.41% 348.358ms 18.933us 3.163s 10.47% 3.163s 171.902us 18400
aten::add 1.81% 446.987ms 1.81% 446.987ms 19.102us 3.158s 10.45% 3.158s 134.968us 23400
aten::copy_ 65.81% 16.223s 65.81% 16.223s 592.095us 3.100s 10.26% 3.100s 113.157us 27400
aten::bmm 1.54% 378.955ms 1.68% 415.014ms 47.161us 1.363s 4.51% 1.363s 154.847us 8800
aten::gelu 0.36% 89.955ms 0.44% 109.146ms 24.806us 1.339s 4.43% 1.339s 304.338us 4400
aten::div 1.45% 356.766ms 1.67% 412.583ms 24.559us 1.310s 4.33% 1.310s 77.966us 16800
aten::mul 1.03% 254.853ms 1.19% 292.355ms 23.577us 1.284s 4.25% 1.284s 103.587us 12400
aten::norm 0.89% 220.114ms 1.04% 255.735ms 29.061us 739.955ms 2.45% 739.955ms 84.086us 8800
aten::_softmax 0.37% 91.987ms 0.44% 108.701ms 24.705us 689.472ms 2.28% 689.472ms 156.698us 4400
aten::index 1.07% 263.258ms 1.41% 347.409ms 39.478us 593.394ms 1.96% 624.986ms 71.021us 8800
aten::_cat 0.07% 17.251ms 0.09% 22.776ms 18.980us 313.962ms 1.04% 313.962ms 261.635us 1200
aten::cudnn_convolution 0.54% 133.034ms 0.55% 135.358ms 338.395us 251.256ms 0.83% 251.256ms 628.139us 400
aten::reshape 0.37% 91.993ms 1.86% 459.024ms 23.661us 220.747ms 0.73% 605.466ms 31.210us 19400
aten::matmul 1.44% 354.640ms 9.48% 2.337s 82.293us 134.992ms 0.45% 7.500s 264.096us 28400
aten::_local_scalar_dense 0.51% 125.133ms 0.51% 125.133ms 28.439us 106.308ms 0.35% 106.308ms 24.161us 4400
aten::clamp_min 0.93% 230.082ms 1.52% 374.422ms 21.274us 106.107ms 0.35% 151.890ms 8.630us 17600
aten::contiguous 0.57% 140.857ms 2.28% 562.268ms 30.558us 79.149ms 0.26% 2.672s 145.243us 18400
aten::select 0.65% 160.474ms 0.69% 171.262ms 12.974us 66.703ms 0.22% 66.703ms 5.053us 13200
aten::dropout 0.50% 123.638ms 3.26% 803.113ms 45.119us 59.428ms 0.20% 3.316s 186.308us 17800
aten::floor_ 0.43% 105.738ms 0.69% 170.760ms 21.345us 57.225ms 0.19% 89.326ms 11.166us 8000
aten::permute 0.51% 126.454ms 0.55% 136.518ms 13.930us 50.686ms 0.17% 50.686ms 5.172us 9800
aten::clamp 0.42% 103.215ms 1.04% 255.826ms 29.071us 48.036ms 0.16% 104.620ms 11.889us 8800
aten::to 0.27% 66.207ms 64.60% 15.925s 1.188ms 39.535ms 0.13% 195.536ms 14.592us 13400
aten::exp 0.32% 79.262ms 0.36% 89.033ms 20.235us 37.657ms 0.12% 37.657ms 8.559us 4400
aten::layer_norm 0.30% 73.926ms 2.24% 551.990ms 52.075us 36.336ms 0.12% 3.748s 353.563us 10600
aten::uniform_ 0.38% 94.313ms 0.38% 94.313ms 11.789us 36.111ms 0.12% 36.111ms 4.514us 8000
aten::clone 0.29% 70.862ms 0.56% 137.021ms 31.141us 33.558ms 0.11% 384.719ms 87.436us 4400
aten::floor 0.26% 65.022ms 0.26% 65.022ms 8.128us 32.101ms 0.11% 32.101ms 4.013us 8000
aten::rand 0.27% 66.745ms 0.73% 179.712ms 22.464us 27.439ms 0.09% 63.550ms 7.944us 8000
aten::log 0.27% 66.454ms 0.29% 70.901ms 16.114us 24.072ms 0.08% 24.072ms 5.471us 4400
aten::clamp_max 0.19% 47.797ms 0.24% 58.637ms 13.327us 20.643ms 0.07% 20.643ms 4.692us 4400
aten::expand_as 0.13% 32.818ms 0.35% 85.549ms 9.721us 16.291ms 0.05% 16.291ms 1.851us 8800
aten::item 0.11% 27.021ms 0.62% 152.154ms 34.580us 15.272ms 0.05% 121.580ms 27.632us 4400
aten::detach_ 0.10% 23.657ms 0.15% 37.651ms 8.557us 15.079ms 0.05% 22.944ms 5.215us 4400
aten::softmax 0.11% 28.155ms 0.56% 136.856ms 31.104us 14.558ms 0.05% 704.030ms 160.007us 4400
aten::cat 0.09% 21.472ms 0.18% 44.248ms 36.874us 7.868ms 0.03% 321.830ms 268.191us 1200
detach_ 0.06% 13.994ms 0.06% 13.994ms 3.181us 7.865ms 0.03% 7.865ms 1.788us 4400
aten::_convolution 0.03% 8.079ms 0.67% 166.163ms 415.407us 6.496ms 0.02% 1.456s 3.639ms 400
aten::convolution 0.01% 2.916ms 0.69% 169.078ms 422.696us 1.845ms 0.01% 1.457s 3.644ms 400
aten::conv1d 0.01% 2.619ms 0.70% 171.697ms 429.243us 1.639ms 0.01% 1.459s 3.648ms 400
aten::empty_strided 0.71% 175.406ms 0.71% 175.406ms 3.580us 0.000us 0.00% 0.000us 0.000us 49000
aten::unsqueeze 0.15% 36.940ms 0.18% 44.742ms 4.661us 0.000us 0.00% 0.000us 0.000us 9600
aten::as_strided 0.41% 102.215ms 0.41% 102.215ms 0.943us 0.000us 0.00% 0.000us 0.000us 108400
aten::empty 1.86% 458.533ms 1.86% 458.533ms 2.759us 0.000us 0.00% 0.000us 0.000us 166200
aten::resize_ 0.27% 66.656ms 0.27% 66.656ms 2.777us 0.000us 0.00% 0.000us 0.000us 24000
aten::view 1.44% 353.845ms 1.44% 353.845ms 2.760us 0.000us 0.00% 0.000us 0.000us 128200
aten::squeeze 0.01% 2.371ms 0.01% 2.764ms 6.909us 0.000us 0.00% 0.000us 0.000us 400
aten::transpose 0.31% 75.865ms 0.42% 103.958ms 3.635us 0.000us 0.00% 0.000us 0.000us 28600
aten::empty_like 0.44% 109.490ms 1.27% 314.236ms 5.004us 0.000us 0.00% 0.000us 0.000us 62800
aten::t 0.37% 90.159ms 0.59% 145.506ms 7.424us 0.000us 0.00% 0.000us 0.000us 19600
aten::_unsafe_view 1.21% 299.454ms 1.38% 340.307ms 10.375us 0.000us 0.00% 0.000us 0.000us 32800
aten::expand 0.49% 121.619ms 0.59% 145.175ms 5.499us 0.000us 0.00% 0.000us 0.000us 26400
aten::slice 0.21% 51.655ms 0.25% 61.250ms 5.280us 0.000us 0.00% 0.000us 0.000us 11600
----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 24.653s
CUDA time total: 30.223s
[Average GPU forward times]
mean forward: 153.86097717285156, std forward: 16.334224700927734
heal_swin.run_configs.segmentation.swin_synwoodscape_large_plus_AD_train_run_config
[Profile config]
{
"batch_size": 1,
"n_warmup": 10,
"n_iter": 200
}
[Profile results]
----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::mm 7.73% 1.225s 8.07% 1.279s 65.246us 4.295s 15.36% 4.295s 219.129us 19600
aten::copy_ 6.59% 1.044s 6.59% 1.044s 37.560us 3.744s 13.39% 3.744s 134.683us 27800
aten::native_layer_norm 5.72% 906.649ms 6.30% 998.784ms 94.225us 3.600s 12.87% 3.600s 339.587us 10600
aten::_fused_dropout 5.80% 918.781ms 6.94% 1.100s 61.800us 2.986s 10.68% 2.986s 167.753us 17800
aten::add_ 5.93% 939.478ms 5.93% 939.478ms 51.059us 2.968s 10.62% 2.968s 161.306us 18400
aten::add 5.75% 911.771ms 5.75% 911.771ms 38.965us 2.033s 7.27% 2.033s 86.898us 23400
aten::bmm 2.94% 466.879ms 3.14% 498.592ms 56.658us 1.267s 4.53% 1.267s 144.019us 8800
aten::gelu 0.91% 144.244ms 0.99% 156.364ms 35.537us 1.255s 4.49% 1.255s 285.160us 4400
aten::mul 3.07% 487.460ms 3.27% 518.049ms 41.778us 1.191s 4.26% 1.191s 96.031us 12400
aten::div 4.38% 694.355ms 4.65% 737.758ms 43.914us 1.184s 4.24% 1.184s 70.494us 16800
aten::roll 3.23% 511.622ms 3.96% 628.194ms 71.386us 659.417ms 2.36% 971.191ms 110.363us 8800
aten::_softmax 0.94% 149.399ms 1.06% 168.422ms 38.278us 649.807ms 2.32% 649.807ms 147.683us 4400
aten::norm 1.54% 244.814ms 1.70% 269.583ms 30.634us 601.240ms 2.15% 601.240ms 68.323us 8800
aten::cudnn_convolution 1.12% 177.122ms 1.14% 180.051ms 450.127us 286.558ms 1.02% 286.558ms 716.394us 400
aten::_cat 0.12% 19.696ms 0.16% 25.208ms 21.007us 194.396ms 0.70% 194.396ms 161.996us 1200
aten::matmul 2.93% 464.879ms 18.62% 2.952s 103.959us 158.453ms 0.57% 6.080s 214.073us 28400
aten::clamp_min 2.19% 347.732ms 3.13% 496.370ms 28.203us 98.016ms 0.35% 139.151ms 7.906us 17600
aten::permute 3.54% 561.576ms 3.69% 585.743ms 31.492us 90.040ms 0.32% 90.040ms 4.841us 18600
aten::contiguous 2.44% 387.266ms 9.66% 1.532s 57.162us 86.392ms 0.31% 3.627s 135.350us 26800
aten::select 1.46% 231.610ms 1.53% 242.951ms 18.405us 63.209ms 0.23% 63.209ms 4.789us 13200
aten::dropout 1.37% 217.781ms 8.31% 1.318s 74.035us 57.421ms 0.21% 3.043s 170.979us 17800
aten::index 1.57% 248.912ms 1.95% 309.002ms 70.228us 52.911ms 0.19% 59.935ms 13.622us 4400
aten::floor_ 2.14% 339.565ms 2.89% 457.598ms 57.200us 52.668ms 0.19% 81.577ms 10.197us 8000
aten::clamp 0.86% 136.163ms 2.69% 426.937ms 48.516us 43.161ms 0.15% 91.492ms 10.397us 8800
aten::layer_norm 2.55% 404.006ms 8.85% 1.403s 132.339us 34.961ms 0.13% 3.635s 342.886us 10600
aten::uniform_ 2.05% 324.684ms 2.05% 324.684ms 40.586us 32.755ms 0.12% 32.755ms 4.094us 8000
aten::exp 0.79% 125.301ms 0.85% 135.096ms 30.704us 31.603ms 0.11% 31.603ms 7.183us 4400
aten::floor 0.74% 118.033ms 0.74% 118.033ms 14.754us 28.909ms 0.10% 28.909ms 3.614us 8000
aten::reshape 0.76% 120.360ms 1.35% 214.568ms 14.116us 25.871ms 0.09% 226.891ms 14.927us 15200
aten::rand 1.72% 273.321ms 3.89% 617.065ms 77.133us 25.848ms 0.09% 58.603ms 7.325us 8000
aten::_local_scalar_dense 0.86% 136.188ms 0.86% 136.188ms 30.952us 21.324ms 0.08% 21.324ms 4.846us 4400
aten::log 1.05% 166.714ms 1.08% 171.020ms 38.868us 21.287ms 0.08% 21.287ms 4.838us 4400
aten::to 0.41% 64.393ms 0.45% 71.598ms 7.955us 19.394ms 0.07% 26.755ms 2.973us 9000
aten::clamp_max 0.70% 111.554ms 0.77% 122.257ms 27.786us 17.069ms 0.06% 17.069ms 3.879us 4400
aten::expand_as 0.27% 42.441ms 0.63% 100.087ms 11.373us 14.204ms 0.05% 14.204ms 1.614us 8800
aten::detach_ 0.21% 32.922ms 0.31% 49.911ms 11.343us 14.173ms 0.05% 21.167ms 4.811us 4400
aten::item 0.25% 39.297ms 1.11% 175.485ms 39.883us 14.137ms 0.05% 35.460ms 8.059us 4400
aten::softmax 0.24% 37.590ms 1.30% 206.012ms 46.821us 14.126ms 0.05% 663.933ms 150.894us 4400
aten::cat 0.18% 28.621ms 0.34% 53.829ms 44.858us 7.615ms 0.03% 202.010ms 168.342us 1200
detach_ 0.11% 16.988ms 0.11% 16.988ms 3.861us 6.993ms 0.03% 6.993ms 1.589us 4400
aten::clone 0.11% 16.735ms 0.21% 33.498ms 41.872us 5.187ms 0.02% 201.020ms 251.275us 800
aten::_convolution 0.04% 6.802ms 1.23% 195.146ms 487.866us 3.195ms 0.01% 324.564ms 811.411us 400
aten::convolution 0.02% 3.792ms 1.25% 198.939ms 497.347us 2.010ms 0.01% 326.575ms 816.437us 400
aten::conv2d 0.02% 3.537ms 1.28% 202.476ms 506.189us 1.920ms 0.01% 328.495ms 821.237us 400
aten::flatten 0.01% 1.655ms 0.02% 3.443ms 17.214us 677.236us 0.00% 992.409us 4.962us 200
aten::empty_strided 0.73% 114.985ms 0.73% 114.985ms 3.212us 0.000us 0.00% 0.000us 0.000us 35800
aten::empty 2.67% 422.990ms 2.67% 422.990ms 2.296us 0.000us 0.00% 0.000us 0.000us 184200
aten::resize_ 0.37% 59.314ms 0.37% 59.314ms 2.471us 0.000us 0.00% 0.000us 0.000us 24000
aten::view 4.13% 654.728ms 4.13% 654.728ms 4.268us 0.000us 0.00% 0.000us 0.000us 153400
aten::transpose 0.47% 74.445ms 0.66% 104.815ms 3.665us 0.000us 0.00% 0.000us 0.000us 28600
aten::as_strided 0.69% 109.455ms 0.69% 109.455ms 0.999us 0.000us 0.00% 0.000us 0.000us 109600
aten::empty_like 0.88% 139.315ms 2.25% 356.528ms 4.667us 0.000us 0.00% 0.000us 0.000us 76400
aten::t 0.56% 88.623ms 0.90% 142.659ms 7.279us 0.000us 0.00% 0.000us 0.000us 19600
aten::_unsafe_view 0.84% 133.269ms 1.59% 251.809ms 8.624us 0.000us 0.00% 0.000us 0.000us 29200
aten::expand 0.85% 134.940ms 1.00% 158.704ms 6.012us 0.000us 0.00% 0.000us 0.000us 26400
aten::unsqueeze 0.21% 32.807ms 0.26% 40.667ms 4.621us 0.000us 0.00% 0.000us 0.000us 8800
aten::slice 0.23% 36.605ms 0.27% 43.192ms 4.499us 0.000us 0.00% 0.000us 0.000us 9600
----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 15.854s
CUDA time total: 27.960s
[Average GPU forward times]
mean forward: 143.00738525390625, std forward: 13.651042938232422
Profiles model inference by timing measurements on the GPU. Uses both torch.autograd.profiler and explicit torch.cuda.Event measurements.
Instantiates model and dataset from run_configs (currently swin_hp and swin for synthetic Woodscape). Loads a single batch, performs a warmup phase and finally measures repeated forward inference on the single batch.
Results are written to text files corresponding to the run_config names in cwd.
Example usage: ./run.py --env singularity bash python -m heal_swin.profile