Why tensorRT does not support batch size higher than 4 ?

wonderingabout commented 5 years ago

@wodesuck TensorRT 3.0.4 with cuda 9.0 cudnn 7.1.4 works no problem with batch size 4 on these 2 machines :

machine 1 :

ubuntu 16.04
ryzen r7 1700, 16gb ram, GTX 1060 6GB

machine 2 :

ubuntu 16.04
Xeon Phi 2 cores/4 threads (cloud), 16gb ram, Tesla P100

But when i increase batch size for example to 8 (or any number which is 5 or more), the engine has a lot of errors and it does not work, reproduced on both of these machines

Example of errors : first : tensor1v2

then after a few seconds : tensor2v2

However, with batch size 4, tensorRT works good : tensor3v3

Also, when enable tensorRT is set to 0 (OFF),

batch sizes of 5 and more are supported again : For example here i set batch size to 48 (thinking time went down from 20s with batch size 4 tensorRT ON to 6.5s with batch size 48 tensorRT OFF for 4000 simulations per move) on tesla P100 :

tensor4 tensor5

And here with batch size 64 tensorRT OFF on tesla P100 :

tensor8 tensor9

Note : without tensorRT, batch size are supported even up to 64 and more (has been tested on Windows 10 with a Tesla V100 successfully) Increasing batch size to 8 has the same effect on speed improvement than tensorRT enabled at batch size 4 Of course, batch size 16,32,64 with tensorRT OFF are much faster to compute games than batch size 4 with tensorRT ON

So, any plans to improve the software ? thanks

wodesuck commented 5 years ago

Because batch size is determined while converting tensorflow checkpoint to tensorrt plan file.

wonderingabout commented 5 years ago

so, what should i do if i want to use tensorRT with batch size 32 for example ?

wodesuck commented 5 years ago

I have released some tools, git pull then, bazel build //model:build_tensorrt_model build tensorrt model with command scripts/build_tensorrt_model.sh $dir $ckpt $batch_size e.g. scripts/build_tensorrt_model.sh ckpt zero.ckpt-20b-v1 4

Note that you need TensorRT installed (.so and python .whl module).

wonderingabout commented 5 years ago

OK great !

i will try to do this and tell you how it goes

edit : i will try it later today, busy currently

wonderingabout commented 5 years ago

update :

i did not have time to try it today, i will try tomorrow i hope

wonderingabout commented 5 years ago

i will still be busy today (writing a long tutorial for gtp2ogs), but since i only have little experience about python it is likely that i will have some difficulties

can you provide me the main steps starting from a clean ubuntu install 16.04, that has only installed cuda 9.0, cudnn 7.1.4, tensorrt 3.0.4, bazel 0.11.1 (phoenixgo tested to work successfully), and nothing more ?

i can manage the details i think, but i would like a few guidelines on the main steps

big thanks @wodesuck

other question : any plans to support latest versions of tensorrt ? (mainly for tesla v100 and p100 best support)

version 5.0 : tensorflow to tensorrt model export requires tensorflow 1.9
version 4.0 : tensorflow to tensorrt model export requires tensorflow 1.8
version 3.0 : tensorflow to tensorrt model export requires tensorflow 1.3

i didnt try tensorrt 4.0 or 5.0 yet, but i assume 5.0 will not work because it's superior to 1.8. It would be great if you could update for tensorrt 5.0

wonderingabout commented 5 years ago

update :

the contribution that i was doing for OGS website has been finished : it is featuring PhoenixGo :

i will put the link here when i finish exporting it to github

it will also be available on a separate github

I hope that tomorrow i will have time to try these steps now

wonderingabout commented 5 years ago

update 2 : the github export that took me a lot of time is finally available here, it features PhoenixGo @wodesuck so you may consider linking : https://github.com/wonderingabout/gtp2ogs-tutorial

i hope that tomorrow i will finally ! be able to try to build batch size 32 with tensorrt on tesla p100 (i also want to try version 4.0 to see if it works)

wonderingabout commented 5 years ago

update :

i still didnt forget about that but i need time to try it

also, i most likely will need help of what to do to install the prerequirement packages and settings

wonderingabout commented 5 years ago

update 2 :

after doing some reading, if i understand it well i need to install tensorrt with tar install (not deb), then export python path and install .whl

i found helpful documentation here (just need to slightly modify for tensorrt 3.0.4) : https://kezunlin.me/post/dacc4196/

then i'll start with a new system with a tar install of cuda as well as tensorrt, then rebuild bazel too

this is the issue i had in my current system after running sudo pip2 install tensorrt-3.0.4-cp27-cp27mu-linux_x86_64.whl in /opt/tensorrt/python (path where is extracted the tar of tensorrt 3.0.4)

tarneeded

i will keep you updated how it goes

wonderingabout commented 5 years ago

update @wodesuck

problem solved !!!

it is indeed much faster now !!!

actually, run install of cuda was not needed (and is not recommended by nvidia), but after sudo apt-get install cuda my mistake was actually a pycuda specific issue (cuda is set up fine)

update : this is actually not a cuda issue but a pycuda issue, fixed it using : https://codeyarns.com/tag/pycuda/ , first created the cuda.sh profile with cuda-9.0 path, REBOOT TO APPLY** !!

then cd into /opt/tensorrt/python and run sudo su - then pip install pycuda

WORKS !!!

pycuda sucess

and then pip install tensorrt-3.0.4-cp27-cp27mu-linux_x86_64.whl**

pycuda works

i avoided to install this last package because some websites said it was not recommended, and phoenixgo can run without this package so i didnt bother much with it, however there were path needed for pycuda

summary : on a brand new ubuntu 16.04 (do not install nvidia-384, it is included in cuda-9.0) all steps followed from here : https://medium.com/@mishra.thedeepak/cuda-and-cudnn-installation-for-tensorflow-gpu-79beebb356d2

deb install of cuda-9.0
then apt-get install cuda-9.0
then update 2 : this is actually not a cuda issue but a pycuda issue, fixed it using : https://codeyarns.com/tag/pycuda/ , first created the cuda.sh profile with cuda-9.0 path, then cd into /opt/tensorrt/python and run sudo su - then pip install pycuda and then pip install tensorrt-3.0.4-cp27-cp27mu-linux_x86_64.whl.whl
then deb install of cudnn 7.0.5 (as recommended by nvidia pdf)
then path exports of cuda in ~.bashrc (/etc/environment is not needed in fact)
then path export of tensorrt 3.0.4 (tar file), all steps from here : https://kezunlin.me/post/dacc4196/ , there is also a copy in my repo : https://kezunlin.me/post/dacc4196/
then tar install of tensorrt 3.0.4 (tar file)
then install of .whl (pycuda) with pip (python2)
then install .whl of uff with pip (python2)
export paths of cuda and cudnn install dirs
compile and run to test a tensorrt sample
locate cuda and cudnn .so and sudo updatedb
run the all in one bazel 0.11.1 command with different paths (different paths for cudnn and tensorrt, see screenshots)
confirm that the engine works with the new setup (works for me)
sudo pip2 install tensorflow-gpu==1.8 then reboot
build as @wodesuck told : https://github.com/Tencent/PhoenixGo/issues/75#issuecomment-448098435

same as the screenshots below

exportbashrc-again tensorrt tar 2 tensorrt tar3 tensorrt tar 4 new locate bazel new1 newbazel 4 bazelworksbatch4newtensorrttar tensorflow a tensorrt b2 tensorrt c

(screenshots will be replaced when i build with intel skylake google cloud for avx avx2 avx512f support)

(will also increase number of search threads to match batch size)

tensorrt d

too bad that i forgot to set intel skylake on google cloud (avx2 and avx512f were disabled because of that, need to rebuild)

conclusion : it works !!!

wonderingabout commented 5 years ago

now that this issue is solved, extra comments :

too bad that i forgot to set intel skylake on google cloud (avx2 and avx512f were disabled because of that, need to rebuild, will update when i do)

i also updated the instructions in my personal repo : https://github.com/wonderingabout/nvidia-archives

will update the pull request with this new data on tensorrt custom max batch size and how to do it

last question : does phoenixgo support tensorrt 5.0 ? (i read that tensorflow 1.9 is needed for the export model to tensorrt in nvidia docs, and also cudnn 7.3 for cuda 9.0 or 10.0)

i ask this because of the Turing support since tensorrt 5.0 (RTX 2080/ RTX 2070 / RTX 2060) @wodesuck

it is late, i will try this next time

wonderingabout commented 5 years ago

indeed, pycuda needs to be installed separately, not following the nvidia instructions, and after that follow nvidia instructions

as explained here : http://0561blue.tistory.com/m/13?category=627413

first create cuda.h file in etc/profile.d as explained here and add cuda-9.0 path in it : then add the path for root in .bashrc and source ~/.bashrc then sudo su - then cd /opt/tensorrt/python then pip install pycuda as root then after pycuda is installed follow nvidia instructions to install the .whl of tensorrt : pip install --upgrade tensorrt-3.0.4-cp27-cp27mu-linux_x86_64.whl then pip install the uff .whl then compile a sample and run it : result : pycuda at last

since this issue is solved, it was exhausting so i'll close it now

wonderingabout commented 5 years ago

tensorrt batch size 4 tensorrt batch size 32 tensorrt batch size 16

gtx1060

last

Tencent / PhoenixGo

Why tensorRT does not support batch size higher than 4 ? #75