Closed Sheeepthief closed 2 years ago
Oh, that could be a bit tricky. Could you please run ./gputest.sh
and upload the resulting gputestresult.txt
file? Also, a Debug file would be pretty neat. Just open http://localhost:4000
and click "Debug file" on the left hand side. It will create a zip file with all sensitive data x-ed out.
I had to delete 3 (irrelevant) Cpuminer_Jayddee logs which made the debug zip too large to upload in a comment.
Thank you. It all looks fine - the OpenCL sorting is correctly aligned with the PCIe bus ids, the miners are started with the correct GPUs. So we have to look a bit closer into the nvidia-settings tool.
Could you please start nvidia-settings --query all
and upload the result here? Is it possible, that the nvidia-settings doesn't detect the P106?
I put the query output into a .txt for easier perusing. nvidia-settings -q all.txt
Querying the simpler nvidia-settings -q GPUS
, and also being able to manually adjust its fan speed and OC at the Nvidia Xserver would indicate that the P106 is detected just fine, as far as I can tell.
Thank you!
Got it. Yes, both GPUs are detected fine. But the GPUs are not sorted by their PCIe bus id on your system. It's possible, that you have added a second GPU into a port with lower PCIe bus id. That would mess up the xorg.conf file.
You will have to edit the xorg.conf (/etc/X11/xorg.conf) file and resort the GPUs according to their PCIe bus ids.
Either edit it directly and change the BusId values:
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:5:0:0"
EndSection
Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:6:0:0"
EndSection
Or you try to recreate the xorg.conf file with nvidia-xconfig --enable-all-gpus --separate-x-screens
There is a good possiblity, that this command resorts the GPUs in the xorg.conf.
So or so - could you please upload the /etc/X11/xorg.conf file here? To be able to upload it, you might need to add .txt
as extension. I'll then make the changes for you.
On the motherboard, the GTX1060 populates the first/topmost PCIe x16 slot and the P106 populates the second PCIe x16 slot. I am as confused as you as to why the BusID's seem to count "bottom up", where the P106 is ID 5 and the 1060 is ID 6.
Here is the original xorg.conf xorg.txt
For fun, I ran the nvidia-xconfig --enable-all-gpus --seperate-x-screens
and found that neither the Device# or BusID's were resorted.
xorg_resort_attempt.txt
Do you have any intuition as to why Device# wouldn't automatically increase with BusID? Like, how is Device# "chosen"? Is it simply motherboard PCI slot ID weirdness?
Ok, try this xorg.conf:
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig: version 470.86
Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0"
Screen 1 "Screen1" RightOf "Screen0"
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection
Section "Files"
EndSection
Section "InputDevice"
# generated from default
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/psaux"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection
Section "InputDevice"
# generated from default
Identifier "Keyboard0"
Driver "kbd"
EndSection
Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
Option "DPMS"
EndSection
Section "Monitor"
Identifier "Monitor1"
VendorName "Unknown"
ModelName "Unknown"
Option "DPMS"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "NVIDIA P106-090"
BusID "PCI:5:0:0"
EndSection
Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "NVIDIA GeForce GTX 1060 6GB"
BusID "PCI:6:0:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "31"
SubSection "Display"
Depth 24
EndSubSection
EndSection
Section "Screen"
Identifier "Screen1"
Device "Device1"
Monitor "Monitor1"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "Coolbits" "31"
SubSection "Display"
Depth 24
EndSubSection
EndSection
Here is the file to download: xorg.txt
You might have to reboot your machine, after changing the xorg.conf.
Is it simply motherboard PCI slot ID weirdness?
Yes, most probably.
Sidenote: at what point does RBM look at all the configs and create/update the startoc.sh files for each miner? Do I need to fully restart RBM to have changes to ocprofile.config register?
The startoc.sh file should be created/updated at each miner start. This means, you only have to restart (just kill it) the currently running miner.
So as far as I can tell, swapping the device numbers in xorg.conf fixed the problem entirely. Thank you! I hope anyone who has the same weird problem I did can find this and resolve the problem themselves.
Hello,
I am using a RainbowMiner installation on Ubuntu 20.04 LTS with multiple GPUs. I have been unsuccessful in using ocprofiles.config.txt and miners.config.txt to overclock a GTX1060 while mining Trex-Ethash. I have triple verified that I have changed the "OCprofile": value under the Ethash section of "Trex-GTX10606GB": to match the name of my desired ocprofile.
I have narrowed down the issue to the startoc_gpu_ethash01.sh file in /RainbowMiner/Bin/NVIDIA-Trex/. This file contains
nvidia-settings -a
commands that match the core clock and memory clock values from my ocprofiles.config.txt, but specify the wrong GPU according to what nvidia-settings (or nvidia Xserver?) expects.The command in the startoc_gpu01_ethash.sh:
nvidia-settings -a '[gpu:1]/GPUPowerMizerMode=1' -a '[gpu:1]/GPUGraphicsClockOffset[3]=150' -a '[gpu:1]/GPUMemoryTransferRateOffset[3]=300'
The command that actually works to overclock the GTX1060:
nvidia-settings -a '[gpu:0]/GPUPowerMizerMode=1' -a '[gpu:0]/GPUGraphicsClockOffset[3]=150' -a '[gpu:0]/GPUMemoryTransferRateOffset[3]=300'
I believe RBM obtains the (incorrect) device value from nvidia-smi, which declares the GTX1060 as GPU:1, while nvidia-settings recognizes the GTX1060 as GPU:0.
How can I get RBM/smi to agree on device values/GPU numbers with nvidia-settings?