ApolloAuto / apollo

An open autonomous driving platform
Apache License 2.0
25.15k stars 9.71k forks source link

Unable to install Nvidia driver under docker environment #1750

Closed MEIXuYan closed 6 years ago

MEIXuYan commented 6 years ago

1 I'm currently setting up apollo on my desktop(Ubuntu 16.04)which has a 1050Ti GPU,I have installed Nvidia Driver of 375.82 version and CUDA8.0 before.

2 When I try the offline perception module (https://github.com/ApolloAuto/apollo/blob/master/docs/howto/how_to_run_perception_module_on_your_local_computer.md): sudo chmod +x NVIDIA-Linux-x86_64-375.82.run ./NVIDIA-Linux-x86_64-375.82.run --no-opengl-files -a -s under docker dev environment,it show as following:

Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.82.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details.

ERROR: An error occurred while performing the step: "Checking to see whether the nvidia kernel module was successfully built".
       See /var/log/nvidia-installer.log for details.

ERROR: The nvidia kernel module was not created.

ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on
       fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

3 I try offline perception visiualizer(https://github.com/ApolloAuto/apollo/blob/master/docs/howto/how_to_run_offline_perception_visualizer.md),it shows as follow:

layer {
  name: "confidence_score"
  type: "Sigmoid"
  bottom: "confidence_pt"
  top: "confidence_score"
  propagate_down: false
}
layer {
  name: "class_score"
  type: "Sigmoid"
  bottom: "classify_pt"
  top: "class_score"
  propagate_down: false
}
I1213 17:14:50.243530  4305 layer_factory.hpp:79] Creating layer input
I1213 17:14:50.243937  4305 net.cpp:94] Creating Layer input
I1213 17:14:50.243968  4305 net.cpp:402] input -> data
F1213 17:14:50.244278  4305 syncedmem.hpp:18] Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version
*** Check failure stack trace: ***
Aborted (core dumped)

4 by the way,after I installed the apollo kernel,type "uname -r",it shows "4.10.0-42-generic ",I think my apollo packs were installed under my orginal Ubuntu kernel,Can this cause the driver issue?

5 More infomation about my GPU,use "nvidia-smi":

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.82                 Driver Version: 375.82                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 0000:01:00.0     Off |                  N/A |
| N/A   38C    P0    N/A /  N/A |      0MiB /  4041MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
hzdsdhr commented 6 years ago

I'm also struggling trying to make the GPU driver to work. The "An error occurred while performing the step: "Building kernel modules". " is easy to solve. The error message told you to check the log file /var/log/nvidia-install.log. If you check it ,you will find that your gcc version is not sufficient for compiling GPU driver. Because the gcc version in original docker is 4.8.4, you will need gcc4.9 or higher version to compile the GPU driver to finish the install. So you need to install a new version gcc(4.9 or higher), and change the gcc version from 4.8 to 4.9, and do the driver installment again.

hzdsdhr commented 6 years ago

After you finish installing the GPU driver, you will need to compile the whole system using apollo.sh. Before you run apollo.sh build_gpu, you need to change the gcc version back to 4.8.

MEIXuYan commented 6 years ago

@hzdsdhr Thank you for your advice,I have solved the driver issue,but another failure occured when I try the Lidar offline visiualizerr(https://github.com/ApolloAuto/apollo/blob/master/docs/howto/how_to_run_offline_perception_visualizer.md):

/apollo/bazel-bin/modules/perception/tool/offline_visualizer_tool/offline_lidar_visualizer_tool

it shows :

I1218 20:32:56.624404   560 track_object_distance.cc:37] location distance weight of TrackObjectDistance is 0.6
I1218 20:32:56.624408   560 track_object_distance.cc:49] direction distance weight of TrackObjectDistance is 0.2
I1218 20:32:56.624411   560 track_object_distance.cc:61] bbox size distance weight of TrackObjectDistance is 0.1
I1218 20:32:56.624415   560 track_object_distance.cc:73] point num distance weight of TrackObjectDistance is 0.1
I1218 20:32:56.624418   560 track_object_distance.cc:85] histogram distance weight of TrackObjectDistance is 0.5
I1218 20:32:56.624424   560 hm_tracker.cc:348] histogram bin size of HmObjectTracker is 10
I1218 20:32:56.624429   560 kalman_filter.cc:38] use adaptive of KalmanFilter is 1
I1218 20:32:56.624433   560 kalman_filter.cc:45] association score maximum of KalmanFilter is 4
I1218 20:32:56.624438   560 kalman_filter.cc:90] measurment noise of KalmanFilter is 0.4
I1218 20:32:56.624440   560 kalman_filter.cc:91] initial velocity noise of KalmanFilter is 5
I1218 20:32:56.624444   560 kalman_filter.cc:93] propagation noise of KalmanFilter is
10 00 00
00 10 00
00 00 10
I1218 20:32:56.624469   560 kalman_filter.cc:57] breakdown threshold maximum of KalmanFilter is 10
I1218 20:32:56.624474   560 lidar_process.cc:284] Init algorithm plugin successfully, tracker: HmObjectTracker
I1218 20:32:56.624482   560 glfw_viewer.cc:64] GLFWViewer::initialize()
libGL error: pci id for fd 33: 8086:591b, driver (null)
libGL error: No driver found
libGL error: failed to load driver: (null)
libGL error: failed to open drm device: Permission denied
libGL error: failed to load driver: i965
I1218 20:32:56.877151   560 opengl_visualizer.cc:44] Initialize OpenglVisualizer successfully
I1218 20:32:56.877287   560 offline_lidar_visualizer_tool.cc:77] starting to run
I1218 20:32:57.184851   560 offline_lidar_visualizer_tool.cc:80]  pose size 3473
I1218 20:32:57.184867   560 offline_lidar_visualizer_tool.cc:81]  pcd size 3475
E1218 20:32:57.184890   560 offline_lidar_visualizer_tool.cc:83] pcd file number does not match pose file number

Do you know how to solve this ?Thank you.

natashadsouza commented 6 years ago

Currently closing this issue. Please refer to updated documents in the master branch to resolve the Nvidia driver installation error.