cndaqiang / E5-PC-daily

服务器集群管理遇到的问题和总结
1 stars 0 forks source link

cuda #21

Open cndaqiang opened 4 years ago

cndaqiang commented 4 years ago

安装cuda失败,或者重装,想卸载

 /usr/bin/nvidia-uninstall

2022-04-29 MINT安装

cndaqiang commented 4 years ago

把驱动加入黑名单中: 在/etc/modprobe.d/blacklist.conf 在后面加入: blacklist nouveau

mv /boot/initrd.img-$(uname -r) /boot/initrd.img-$(uname -r).bak

apt install dracut-core dracut -v /boot/initrd.img-$(uname -r) $(uname -r)

或者这样? Ubuntu 16.04 LTS, excute sudo update-initramfs -u and reboot the computer;

cndaqiang commented 4 years ago

apt install dracut-core dracut -v /boot/initrd.img-$(uname -r) $(uname -r)

cndaqiang commented 4 years ago
apt-get remove --purge  *nvidia*
sudo apt-get autoremove
cndaqiang commented 4 years ago

https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07

cndaqiang commented 4 years ago

sudo apt-get install build-essential gcc-multilib dkms xorg xorg-dev

cndaqiang commented 4 years ago

最后是莫名奇妙的用系统装的驱动好了 ,好像改过一次这里,先选intel重启又选nvidia,重启

prime profiles

cndaqiang commented 4 years ago
sudo apt install nvidia-cuda-toolkit 
cndaqiang@girl:~/code/cuda$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
cndaqiang commented 4 years ago

https://askubuntu.com/questions/1181959/installing-cuda-10-and-tensorflow-2-0-ubuntu-19-10

sudo apt install nvidia-cuda-toolkit 
cndaqiang commented 4 years ago

安装pgi编译器

sudo apt-get install lsb

从主页下载安装 https://www.pgroup.com/index.htm

添加环境变量

export PGI=/opt/pgi/linux86-64/19.10
export MANPATH=$MANPATH:$PGI/man
export LM_LICENSE_FILE=/opt/pgi/license.dat
export PATH=$PATH:$PGI/bin

MPIDIR=$PGI/mpi/openmpi-3.1.3
export PATH=$MPIDIR/bin:$PATH
export LD_LIBRARY_PATH=$MPIDIR/lib:$LD_LIBRARY_PATH
cndaqiang commented 4 years ago

pgi编译示例

cndaqiang@girl:~/code/cuda$ ls
cuda.cuf
cndaqiang@girl:~/code/cuda$ pgf90 cuda.cuf 
cndaqiang@girl:~/code/cuda$ ./a.out 
 Passed

cuda.cuf文件内容

module simpleOps_m
contains
 attributes(global)subroutine increment(a,b)
  implicit none
  integer,intent(inout)::a(:)
  integer,value::b
  integer::i
  i = threadIdx%x
  a(i)=a(i)+b
 end subroutine increment
end module simpleOps_m

program incrementTestGPU
  use cudafor
  use simpleOps_m
  implicit none
  integer ,parameter :: n =256
  integer :: a(n),b
  integer, device :: ad(n)

  a=1
  b=3
  ad=a
  call increment<<<1,n>>>(ad,b)
  a=ad
  if(any(a /=4)) then
    write(*,*) "Failed"
  else
    write(*,*)  "Passed"
  endif
end program incrementTestGPU
!————————————————
!版权声明:本文为CSDN博主「八家铺子」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
!原文链接:https://blog.csdn.net/Slow_Jiulong/article/details/53105223
cndaqiang commented 4 years ago

Mint19 install cuda

sudo apt-get install build-essential gcc-multilib dkms xorg xorg-dev

image

blacklist nouveau

oem@cndaqiangTest:~$ sudo vi /etc/modprobe.d/blacklist.conf 
oem@cndaqiangTest:~$  sudo update-initramfs -u
reboot
cndaqiang commented 4 years ago
sudo apt-get remove --purge  *nvidia*
sudo apt-get autoremove

install driver image select recommended driver

reboot

cndaqiang commented 4 years ago
nvidia-smi

image

[wulina]zen me jiu hao le me, bu gai you wen ti de ma

cndaqiang commented 4 years ago

nvidia-smi 显示无法与驱动联系,可能是因为xserver默认配置是Intel核显了 2020-03-06 18-55-29 创建的截图

cndaqiang commented 2 years ago

查看硬件信息

root@mommint:/home/cndaqiang# inxi -Fxz
System:    Host: mommint Kernel: 4.15.0-20-generic x86_64 bits: 64 gcc: 7.3.0 Console: tty 0
           Distro: Linux Mint 19
Machine:   Device: server System: ZTSYSTEM product: CYPRESS11 v: 1.0 serial: <filter>
           Mobo: Intel model: S2600CP serial: N/A
           UEFI: Intel v: SE5C600.86B.02.06.0006.032420170950 date: 03/24/2017
CPU:       10 core Intel Xeon E5-2680 v2 (-MT-MCP-) arch: Ivy Bridge rev.4 cache: 25600 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 55865
           clock speeds: max: 3600 MHz 1: 3093 MHz 2: 3097 MHz 3: 3092 MHz 4: 3092 MHz 5: 3092 MHz 6: 3092 MHz
           7: 3092 MHz 8: 3092 MHz 9: 3092 MHz 10: 3108 MHz 11: 3093 MHz 12: 3092 MHz 13: 3092 MHz 14: 3092 MHz
           15: 3123 MHz 16: 3092 MHz 17: 3092 MHz 18: 3092 MHz 19: 3092 MHz 20: 3092 MHz
Graphics:  Card-1: NVIDIA GK208 [GeForce GT 730] bus-ID: 04:00.0
           Card-2: Matrox Systems MGA G200e [Pilot] ServerEngines (SEP1) bus-ID: 09:00.0
           Display Server: X.org 1.19.6 drivers: (unloaded: modesetting) FAILED: fbdev,vesa,nouveau
           tty size: 208x26 Advanced Data: N/A for root out of X
Audio:     Card NVIDIA GK208 HDMI/DP Audio Controller driver: snd_hda_intel bus-ID: 04:00.1
           Sound: Advanced Linux Sound Architecture v: k4.15.0-20-generic
Network:   Card-1: Intel I350 Gigabit Network Connection driver: igb v: 5.4.0-k port: 1020 bus-ID: 07:00.0
           IF: enp7s0f0 state: up speed: 1000 Mbps duplex: full mac: <filter>
           Card-2: Intel I350 Gigabit Network Connection driver: igb v: 5.4.0-k port: 1000 bus-ID: 07:00.1
           IF: enp7s0f1 state: down mac: <filter>
Drives:    HDD Total Size: 7001.4GB (19.5% used)
           ID-1: /dev/sda model: TOSHIBA_DT01ACA3 size: 3000.6GB temp: 39C
           ID-2: /dev/sdb model: WDC_WD40PURX size: 4000.8GB temp: 34C
Partition: ID-1: / size: 98G used: 56G (61%) fs: ext4 dev: /dev/sda3
           ID-2: /home size: 2.5T used: 1.2T (49%) fs: ext4 dev: /dev/sda4
RAID:      No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors:   System Temperatures: cpu: 33.0C mobo: N/A gpu: 0.0:
           Fan Speeds (in rpm): cpu: N/A
Info:      Processes: 356 Uptime: 10 min Memory: 1625.6/64320.1MB Init: systemd runlevel: 5 Gcc sys: 7.5.0
           Client: Shell (bash 4.4.191) inxi: 2.3.56