HBPNeurorobotics / CDP4_experiment

This is the NRP experiment for CDP4
GNU General Public License v3.0
0 stars 4 forks source link

Protobuf version mismatch (3.5.0 required, 3.5.0 installed but 3.4.0 detected) #15

Open albornet opened 6 years ago

albornet commented 6 years ago

Hi! When I'm running the CDP4_experiment on the Neurorobotics Platform, I'm having a protobuf version issue. When I launch the experiment, at some point the backend returns the following error (and the launch stays forever at the step "Loading transfer function: image_to_saliency":

[libprotobuf FATAL external/protobuf_archive/src/google/protobuf/stubs/common.cc:68] This program requires version 3.5.0 of the Protocol Buffer runtime library, but the installed version is 3.4.0. Please update your library. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "google/protobuf/descriptor.pb.cc".)

The whole backend output is here, if needed: here

When I comment the image_to_saliency TF in the .bibi file, the experiment can be launched and played (although nothing happens when I press the play button).

I tried to look for google/protobuf in both my system (/usr/local/lib/python2.7/dist-packages) and my platform_venv ($HOME/.opt/platform_venv/lib/python2.7/site-packages). Both versions were 3.5.X but to be sure, I uninstalled them and reinstalled them with pip, to have both version at "3.5.0.post1". However the same error arises. I cannot figure out where a 3.4.0 version of protobuf is installed.

Have you ever met the same error? I saw the same issue for a very similar experiment on the forum (here), but I could not understand how it was solved :s

Thank you for your help! Best, Alban

jackokaiser commented 6 years ago

Hey Alban,

That's a tricky one: both Tensorflow and Gazebo rely on protobuf, but of course they want different version. This is the reason why, in the image_to_saliency.py we add a folder to the PYTHONPATH programatically.

albornet commented 6 years ago

Hi! Indeed I had a small irregularity in my PYTHONPATH. After correcting it, the same error occured. My way to correct the bug was to uninstall all my CUDA / CUDNN versions and re-install CUDA-9.0 (+ corresponding CUDNN) and then tensorflow-1.6.0. Now it works better (the libprotobuf version mismatch did not arise anymore!).

However, I get an "out_of_memory" error when I start the experiment (although the experiment launches, then the saliency network is never initialized). I noticed in the output of the terminal nrp-backend that the saliency network is mapped twice on my GPU. I found it strange because when I use the attention package on its own (outside NRP), this never happens.

Here is the relevant part of the terminal output (or full output here):

"2018-07-02 11:34:09 GMT+0200 [REQUEST from ::ffff:127.0.0.1] GET /storage/CDP4_experiment_0/experiment_configuration.exc?byname=true Now using node v0.10.48 (npm v2.15.1) 2018-07-02 11:34:11.014513: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-07-02 11:34:11.108949: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-07-02 11:34:11.109649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.645 pciBusID: 0000:01:00.0 totalMemory: 7.93GiB freeMemory: 7.59GiB 2018-07-02 11:34:11.109683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0 2018-07-02 11:34:11.324549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7331 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) Found /home/alban/Documents/NRP/gzweb/gzbridge/ws_server.js Starting node: OK.

/home/alban/Documents/NRP/CLE/hbp_nrp_cle/hbp_nrp_cle/robotsim/GazeboHelper.py:119: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead. 2018-07-02 11:34:11,843 [Thread-7 ] [hbp_nrp_cles] [INFO] RobotAbs: /tmp/nrp-simulation-dir/hollie.sdf Pose message filter parameters: minimum seconds between successive messages: 0.02 minimum XYZ distance squared between successive messages: 0.00001 minimum Quaternion distance squared between successive messages: 0.0001 Mon Jul 02 2018 11:34:11 GMT+0200 (CEST) Server is listening on port 7681 [ INFO] [1530524052.282033853]: Camera Plugin (robotNamespace = ), Info: the 'robotNamespace' param did not exit [ INFO] [1530524052.285858891]: Camera Plugin (ns = ) , set to "" [ INFO] [1530524052.286094616]: Camera Plugin (robotNamespace = ), Info: the 'robotNamespace' param did not exit [ INFO] [1530524052.288751442]: Camera Plugin (ns = ) , set to "" 2018-07-02 11:34:12,327 [Thread-7 ] [hbp_nrp_cles] [INFO] Preparing CLE Server 2018-07-02 11:34:12,337 [Thread-7 ] [hbp_nrp_cle.] [INFO] Robot control adapter initialized 2018-07-02 11:34:12,359 [Thread-7 ] [hbp_nrp_cle.] [INFO] neuronal simulator initialized 2018-07-02 11:34:12,359 [Thread-7 ] [BrainLoader ] [INFO] Loading brain model from python: /tmp/nrp-simulation-dir/idle_brain.py 2018-07-02 11:34:12,378 [Thread-7 ] [hbp_nrp_cle.] [INFO] Saving brain source 2018-07-02 11:34:12,378 [Thread-7 ] [hbp_nrp_cle.] [INFO] Initialize transfer functions node tfnode 2018-07-02 11:34:12,378 [Thread-7 ] [hbp_nrp_cle.] [INFO] PyNN communication adapter initialized 2018-07-02 11:34:12,379 [Thread-7 ] [hbp_nrp_cle.] [WARNING] ROS node already initialized with another name 2018-07-02 11:34:12,384 [Thread-7 ] [hbp_nrp_cles] [INFO] Registering ROS Service handlers 2018-07-02 11:34:12,386 [Thread-7 ] [hbp_nrp_cles] [INFO] Registering ROS Service handlers 2018-07-02 11:34:12,963 [Thread-3 ] [rospy.intern] [INFO] topic[/ros_cle_simulation/0/lifecycle] adding connection to [/nrp_backend], count 0 2018-07-02 11:34:13.280555: I tensorflow/core/platform/cpu_feature_guard.cc:140] 2018-07-02 11:34:13,282 [Thread-18 ] [rospy.intern] [INFO] topic[/clock] adding connection to [http://127.0.0.1:37719/], count 0 2018-07-02 11:34:13,282 [Thread-17 ] [rospy.intern] [INFO] topic[/ros_cle_simulation/0/lifecycle] adding connection to [http://127.0.0.1:35463/], count 0 2018-07-02 11:34:13,282 [Thread-15 ] [rospy.intern] [INFO] topic[/ros_cle_simulation/0/lifecycle] adding connection to [http://127.0.0.1:36595/], count 0 2018-07-02 11:34:13,284 [Thread-3 ] [rospy.intern] [INFO] topic[/ros_cle_simulation/0/lifecycle] adding connection to [/ros_cle_simulation_14404_1530524010422], count 1 2018-07-02 11:34:13.358846: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-07-02 11:34:13.359468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.645 pciBusID: 0000:01:00.0 totalMemory: 7.93GiB freeMemory: 275.06MiB 2018-07-02 11:34:13.359506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0 2018-07-02 11:34:13.621509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 219 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) 2018-07-02 11:34:13.622443: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 219.31M (229965824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY"

Do you have any idea why this happens? I think it should not, should it?

Thanks for the help!! Alban