Closed wonderingabout closed 5 years ago
Because batch size is determined while converting tensorflow checkpoint to tensorrt plan file.
so, what should i do if i want to use tensorRT with batch size 32 for example ?
I have released some tools, git pull
then,
bazel build //model:build_tensorrt_model
build tensorrt model with command
scripts/build_tensorrt_model.sh $dir $ckpt $batch_size
e.g. scripts/build_tensorrt_model.sh ckpt zero.ckpt-20b-v1 4
Note that you need TensorRT installed (.so
and python .whl
module).
OK great !
i will try to do this and tell you how it goes
edit : i will try it later today, busy currently
update :
i did not have time to try it today, i will try tomorrow i hope
i will still be busy today (writing a long tutorial for gtp2ogs), but since i only have little experience about python it is likely that i will have some difficulties
can you provide me the main steps starting from a clean ubuntu install 16.04, that has only installed cuda 9.0, cudnn 7.1.4, tensorrt 3.0.4, bazel 0.11.1 (phoenixgo tested to work successfully), and nothing more ?
i can manage the details i think, but i would like a few guidelines on the main steps
big thanks @wodesuck
other question : any plans to support latest versions of tensorrt ? (mainly for tesla v100 and p100 best support)
i didnt try tensorrt 4.0 or 5.0 yet, but i assume 5.0 will not work because it's superior to 1.8. It would be great if you could update for tensorrt 5.0
update :
the contribution that i was doing for OGS website has been finished : it is featuring PhoenixGo :
i will put the link here when i finish exporting it to github
it will also be available on a separate github
I hope that tomorrow i will have time to try these steps now
update 2 : the github export that took me a lot of time is finally available here, it features PhoenixGo @wodesuck so you may consider linking : https://github.com/wonderingabout/gtp2ogs-tutorial
i hope that tomorrow i will finally ! be able to try to build batch size 32 with tensorrt on tesla p100 (i also want to try version 4.0 to see if it works)
update :
i still didnt forget about that but i need time to try it
also, i most likely will need help of what to do to install the prerequirement packages and settings
update 2 :
after doing some reading, if i understand it well i need to install tensorrt with tar install (not deb), then export python path and install .whl
i found helpful documentation here (just need to slightly modify for tensorrt 3.0.4) : https://kezunlin.me/post/dacc4196/
then i'll start with a new system with a tar install of cuda as well as tensorrt, then rebuild bazel too
this is the issue i had in my current system after running sudo pip2 install tensorrt-3.0.4-cp27-cp27mu-linux_x86_64.whl
in /opt/tensorrt/python (path where is extracted the tar of tensorrt 3.0.4)
i will keep you updated how it goes
update @wodesuck
problem solved !!!
it is indeed much faster now !!!
actually, run install of cuda was not needed (and is not recommended by nvidia),
but after sudo apt-get install cuda
my mistake was actually a pycuda specific issue (cuda is set up fine)
update : this is actually not a cuda issue but a pycuda issue, fixed it using : https://codeyarns.com/tag/pycuda/ , first created the cuda.sh profile with cuda-9.0 path, REBOOT TO APPLY** !!
then cd into /opt/tensorrt/python and run sudo su -
then pip install pycuda
WORKS !!!
and then pip install tensorrt-3.0.4-cp27-cp27mu-linux_x86_64.whl
**
i avoided to install this last package because some websites said it was not recommended, and phoenixgo can run without this package so i didnt bother much with it, however there were path needed for pycuda
summary : on a brand new ubuntu 16.04 (do not install nvidia-384, it is included in cuda-9.0) all steps followed from here : https://medium.com/@mishra.thedeepak/cuda-and-cudnn-installation-for-tensorflow-gpu-79beebb356d2
sudo su -
then pip install pycuda
and then pip install tensorrt-3.0.4-cp27-cp27mu-linux_x86_64.whl.whl
sudo updatedb
sudo pip2 install tensorflow-gpu==1.8
then rebootsame as the screenshots below
(screenshots will be replaced when i build with intel skylake google cloud for avx avx2 avx512f support)
(will also increase number of search threads to match batch size)
too bad that i forgot to set intel skylake on google cloud (avx2 and avx512f were disabled because of that, need to rebuild)
conclusion : it works !!!
now that this issue is solved, extra comments :
too bad that i forgot to set intel skylake on google cloud (avx2 and avx512f were disabled because of that, need to rebuild, will update when i do)
i also updated the instructions in my personal repo : https://github.com/wonderingabout/nvidia-archives
will update the pull request with this new data on tensorrt custom max batch size and how to do it
last question : does phoenixgo support tensorrt 5.0 ? (i read that tensorflow 1.9 is needed for the export model to tensorrt in nvidia docs, and also cudnn 7.3 for cuda 9.0 or 10.0)
i ask this because of the Turing support since tensorrt 5.0 (RTX 2080/ RTX 2070 / RTX 2060) @wodesuck
it is late, i will try this next time
indeed, pycuda needs to be installed separately, not following the nvidia instructions, and after that follow nvidia instructions
as explained here : http://0561blue.tistory.com/m/13?category=627413
first create cuda.h file in etc/profile.d as explained here and add cuda-9.0 path in it :
then add the path for root in .bashrc and source ~/.bashrc
then sudo su -
then cd /opt/tensorrt/python
then pip install pycuda
as root
then after pycuda is installed follow nvidia instructions to install the .whl of tensorrt : pip install --upgrade tensorrt-3.0.4-cp27-cp27mu-linux_x86_64.whl
then pip install the uff .whl
then compile a sample and run it :
result :
since this issue is solved, it was exhausting so i'll close it now
tensorrt batch size 4 tensorrt batch size 32 tensorrt batch size 16
gtx1060
@wodesuck TensorRT 3.0.4 with cuda 9.0 cudnn 7.1.4 works no problem with batch size 4 on these 2 machines :
machine 1 :
machine 2 :
But when i increase batch size for example to 8 (or any number which is 5 or more), the engine has a lot of errors and it does not work, reproduced on both of these machines
Example of errors : first :
then after a few seconds :
However, with batch size 4, tensorRT works good :
Also, when enable tensorRT is set to 0 (OFF),
batch sizes of 5 and more are supported again : For example here i set batch size to 48 (thinking time went down from 20s with batch size 4 tensorRT ON to 6.5s with batch size 48 tensorRT OFF for 4000 simulations per move) on tesla P100 :
And here with batch size 64 tensorRT OFF on tesla P100 :
Note : without tensorRT, batch size are supported even up to 64 and more (has been tested on Windows 10 with a Tesla V100 successfully) Increasing batch size to 8 has the same effect on speed improvement than tensorRT enabled at batch size 4 Of course, batch size 16,32,64 with tensorRT OFF are much faster to compute games than batch size 4 with tensorRT ON
So, any plans to improve the software ? thanks