Where does Docker image get pytorch library from?

daviduhm commented 5 years ago

We're trying to build Docker image for Apollo 3.5 following this documentation: https://github.com/ApolloAuto/apollo/tree/master/docker/build.

However, there is no installer bash script for pytorch, and Apollo somehow updated its Docker image to include pytorch library (https://github.com/ApolloAuto/apollo/pull/7101/files) without any installers. This blocks us from building a docker image from this Dockerfile.

It should either include installer bash script for pytorch library or open the pytorch source code that was used for building pytorch library files like libtorch.so and libcaffe2_gpu.so located in /usr/local/apollo/libtorch_gpu/lib. We've tried to build pytorch from the official pytorch source code (https://github.com/pytorch/pytorch), but the generated library files were different from the ones included in Apollo docker image.

christian-lanius commented 5 years ago

I have written a pytorch install script which seems to work. It assumes that you always want to use the gpu version (I compile the gpu version and link it to the cpu folder Apollo assumes). I wrote this script because I use Cuda 10 (and Ubuntu 18.04). If you use a different version, change the magma version in line 4. If your GPU does not support cuda compute 60/61, change line 17. https://gist.github.com/christian-lanius/306afa00d94302b98bb27af98233a311

Also, big fan of the lgsvl simulator!

daviduhm commented 5 years ago

Thanks for your comment! I've tried to build from the source, but the generated library files look different, and more importantly it doesn't have libcaffe2_gpu.so file which Apollo now preloads it in apollo_base.sh. I also tried this commit as your bash script, but it still doesn't generate libcaffe2_gpu.so file.

It would be very helpful if it's possible to know which version or commit ID of pytorch (or source code if it's a modified version of pytorch) Apollo used to generate those pytorch libraries in Docker image.

gengqx commented 5 years ago

the commit is 96faaa9d50fd4b5d96a9d400a544f8e229c52aa8 I got it from apollo team,But I have not install and test it. if you test it success,and release the install scripts,it would be great help.

christian-lanius commented 5 years ago

@daviduhm have you tried to use the binary package from https://pytorch.org/? https://download.pytorch.org/libtorch/cu100/libtorch-shared-with-deps-latest.zip This prebuilt binary contains the libcaffe2_gpu.so file which is preloaded by Apollo. Though this version is probably built from the 1.1 branch which is a bit newer than the commit linked by @gengqx

daviduhm commented 5 years ago

Unfortunately, either the commit ID, 96faaa9d50fd4b5d96a9d400a544f8e229c52aa8, or the prebuilt binary version from pytorch.org didn't work out for me. Hopefully, we can get some help from the Apollo team on this because otherwise this dockerfile is not up-to-date and it is not possible to build Docker image following the README.md: @xiaoxq @wanglei828 @kechxu

xmyqsh commented 5 years ago

@daviduhm You could have a try my suggestion. I'm not sure it is worked for your environment or not. bash docker/scripts/dev_start.sh bash docker/scripts/dev_into.sh tar -zcvf /apollo/docker/build/installers/libtorch.tar.gz /usr/local/apollo/libtorch tar -zcvf /apollo/docker/build/installers/libtorch_gpu.tar.gz /usr/local/apollo/libtorch_gpu

cat > /apollo/docker/build/installers/install_pytorch.sh
tar -zxvf /apollo/docker/build/installers/libtorch.tar.gz /usr/local/apollo/
tar -zxvf /apollo/docker/build/installers/libtorch_gpu.tar.gz /usr/local/apollo/
^C

add RUN bash /tmp/installers/install_pytorch.sh into dev.x86_64.dockerfile exit bash docker/build/build_dev.sh dev.x86_64.dockerfile

daviduhm commented 5 years ago

@xmyqsh Thanks for your suggestion. But, the reason why I'm trying to build a libtorch from the scratch is because I'd like to upgrade the Cuda version inside the image because it doesn't support GPU Volta architecture as here and here. The original Docker image you can pull from Apollo comes with Cuda 8 installed and the PyTorch was built with Cuda 8 as well. We need to recompile pytorch library with Cuda9/sm_70. To do that, we need to know which version/commit of pytorch source code and what parameters were used when Apollo build this library.

xmyqsh commented 5 years ago

@daviduhm With your CUDA9 requirement, you should change FROM nvidia/cuda:8.0-cudnn7-devel-ubuntu14.04 to FROM nvidia/cuda:9.x-cudnn7-devel-ubuntu14.04 Then solve the possible compatible issues following up. Then add pytorch_installer with install instruction with the pytorch version you familiar with or provided by the apollo team. Then update the corresponding pytorch api in the apollo repo if needed.

Wait a minute, I think the biggest problem you will encounter in this process is solving the compatible problem by moving from CUDA8 to CUDA9. You will recompile caffe, paddlepaddle, tensorrt and so on with CUDA9 support. You should ask for the apollo team for this code with specific version.

But I think there is no need to use GPU Volta architecture or CUDA9 in apollo repo which will be used on self-driving car, not server side. And pytorch is only used by prediction evaluator in apollo repo.

daviduhm commented 5 years ago

@xmyqsh Yes. We've already updated Cuda version from 8 to 9 inside the docker image and recompiled caffe and tensorrt with Cuda 9 because Apollo open-sourced their modified version of Caffe. Every other modules including perception is working well except prediction module because of this pytorch library issue. So, we are almost there :)

xmyqsh commented 5 years ago

@daviduhm Waiting for the Apollo team reply @xiaoxq @wanglei828 @kechxu

Otherwise, you should compare the /usr/local/apollo/libtorch_gpu/include with pytorch repo commit by commit. And praying for there is no customed modification on pytorch by Apollo team.

christian-lanius commented 5 years ago

@daviduhm how did you determine that your prediction is not working correctly? I am facing the same problem as you (using an RTX2080TI), so I built my own docker image and, as far as I can tell, the prediction works fine, but now I am scared that I'm overlooking something.

daviduhm commented 5 years ago

@christian-lanius When you run prediction module either from dreamview or in command line (i.e., bash scripts/prediction.sh), it fails to load prediction library file and then crashes. This issue only happens with the latest apollo code where they started to preload libcaffe_gpu.so from apollo_base.sh. Prediction should work fine with Cuda9 if you are using old version of apollo.

Unfortunately, our team decided not to spend more time on this because it's really difficult to solve it without Apollo team's inputs (their build instructions for pytorch gpu).

ApolloAuto / apollo

Where does Docker image get pytorch library from? #9085