PRBonn / bonnet

Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics.
GNU General Public License v3.0
325 stars 89 forks source link

running docker image #36

Open daobilige-su opened 6 years ago

daobilige-su commented 6 years ago

Hi,

First, thanks for the software. Looks very cool.

I am using the docker image provided. But I have 2 questions about it.

After I read the dockerfile in the image, if I understood correctly, all dependencies are built in the image, but the actual bonnet is not installed inside the image. Is that correct? Because I could not find the lines correspond to installation of bonnet code. If so, do I have to install it by myself on top of the image?

I ran the helloworld.py under the \bonnet-docker folder of the image. Then I got Segmentation fault (core dumped) error. When I execute the code inside helloworld.py line by line, I come to know that it is the import tensorrt causing the error. Does it works fine on you machine?

Thanks for any help or suggestion.

Cheers, Su

tano297 commented 6 years ago

Hello,

I am not having any problems running the hello world. Can you check that you are using the proper docker version? You should be using nvidia-docker link, not the vanilla one.

You can check if your gpu functions are working inside the docker container running

$ nvidia-smi
daobilige-su commented 6 years ago

Hi @tano297 ,

Thanks for your reply.

I finally made everything running now. It turns out that I need to delete /usr/local/cuda/lib64/stubs/libcuda.so.1 file to make tensorrt and tensorflow work. Also I need to recompile tensorflow C++ API by adding CC_OPT_FLAGS="-march=native" flag before compiling to support my CPU version.

It is a really nice software, enjoying it now. Thanks.

Cheers, Su

tano297 commented 6 years ago

I'm glad to hear that! There are sometimes some caveats for each architecture, which I try to minimize, but they escape.

The /usr/local/cuda/lib64/stubs/libcuda.so.1 thing should definitely not be happening, so I will have a look into it. Leaving this issue open until I can reproduce it and fix it.

hyejun commented 6 years ago

I'm glad to hear that! There are sometimes some caveats for each architecture, which I try to minimize, but they escape.

The /usr/local/cuda/lib64/stubs/libcuda.so.1 thing should definitely not be happening, so I will have a look into it. Leaving this issue open until I can reproduce it and fix it.

Hello.

I have same error in docker.

In my case, Standalone examples don`t work.

When I execute ./build/bonnet_standalone/session, I got Illegal instruction (core dumped).

I checked

nvidia-smi Thu Oct 18 01:23:58 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.48 Driver Version: 410.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX TITAN Off | 00000000:01:00.0 On | N/A | | 30% 41C P8 18W / 250W | 585MiB / 6075MiB | 1% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1558 G /usr/lib/xorg/Xorg 233MiB | | 0 2304 G /opt/teamviewer/tv_bin/TeamViewer 20MiB | | 0 2578 G compiz 209MiB | | 0 23643 G ...quest-channel-token=6920769117415252391 72MiB | | 0 32701 G ...-token=B9940EAD24EB7BFE7CB48B880BC0A2AE 43MiB | +-----------------------------------------------------------------------------+

python3 import tensorflow it is ok.

helloworld.py under the \bonnet-docker folder it is ok.

I think c++ with tensorflow have some problem.

how to rebuild tensorflow C++ API by adding CC_OPT_FLAGS="-march=native" flag ??

daobilige-su commented 6 years ago

Hi,

First, you need to make sure the problem arise from tensorflow C++ API. To do that, just run the test program of it.

$ cd /tools/tensorflow_cc/example
$ mkdir build && cd build
$ cmake ..
$ ./example

If the above test program is giving you the same error, then it is surely the tensorflow C++ API is the source of the error. Your CPU version is too old to be supported by the default configuration of tensorflow C++ API. To recompile tensorflow API, do followings:

$ cd /tools/tensorflow_cc/tensorflow_cc
$ mkdir build
$ cd build
$ export CC_OPT_FLAGS="-march=native"
$ cmake -DTENSORFLOW_STATIC=OFF -DTENSORFLOW_SHARED=ON -DTENSORFLOW_TAG="v1.7.0" ..
$ make -j
$ make install
$ rm -rf ~/.cache && cd .. && rm -rf build

after that, you might also needs to re-install tensorflow again, since the installation of tensorflow C++ API will install a different version of tensorflow, which you have to uninstall and install the correct version of tensorflow again.

RUN pip3 uninstall numpy tensorflow-gpu tensorflow matplotlib -y && \
pip3 install -U tensorflow-gpu==1.7.0 protobuf==3.5.1 matplotlib==2.2.2

Hopefully that's it.

Cheers, Su

hyejun commented 6 years ago

I checked problem arise from tensorflow C++ API. Then, I tried to install tensorflow again. But, there are some errors while building tensorflow.

So, I tried to install docker to another computer and it is ok. you said " Your CPU version is too old to be supported by the default configuration of tensorflow C++ API. ", maybe it is right.

Thank you for answering.

blubbi321 commented 6 years ago

I can confirm the issue. No problems following along the instructions on a more recent machine. However, I could not yet resolve all the dependencies for the steps @daobilige-su mentioned above. (Apparently one needs to also install g++-7, which is then in turn incompatible with the cuda libs "/usr/local/cuda-9.0/bin/../targets/x86_64-linux/include/crt/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 6 are not supported!")

Machine with the trouble is an Intel i7-2600K in case that helps anybody