Closed JStateson closed 4 years ago
@KeithMyers Thanks! Exactly what I needed. Looks like pcie ID is stored differently between NV and AMD. Should be an easy fix. I will let you know when I push an update with the changes.
I would expect that to be the case since two different API's are being used. Boinc depends on the vendor API to pull the identifying information and capabilities of the detected cards. So AMD's API uses a different format compared to Nvidia's API to store the PCIe ID.
// Detection of AMD/ATI GPUs // // Docs: // http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_CAL_Programming_Guide_v2.0%5B1%5D.pdf // ?? why don't they have HTML docs??
// NvAPI now provides an API for getting #cores :-) // But not FLOPs per clock cycle :-( // Anyway, don't use this for now because server code estimates FLOPS // based on compute capability, so we may as well do the same // See http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/ //
// OpenCL interfaces are documented here: // http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/ and // http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/
@KeithMyers I just posted a fix, but not able to test. Can you give it a try?
@JStateson In this latest release, I added openCL Device Index, which may help in solving mapping problem. I noticed that coproc_info.xml
has both device number
and opencl_device_index
. This looks promising.
I just pushed another update. clinfo for NV reports back Bus ID and Slot ID, but the format of pcie ID is Bus:Device:Function, so I assumed Slot maps to Device. Only way to know for sure is to see if it matches to pcie ID found by lspci.
Seems from @JStateson post above, clinfo without --raw is giving Device Topology. It would be useful to see the output of --raw and default for nvidia.
Doesn't seem to be picking up the OpenCL device number.
keith@Serenity:~/Downloads/benchMT-master$ ./benchMT --lsgpu
benchMT workdir Path [ /home/keith/Downloads/benchMT-master/workdir/ ] does not exist, making...
TestData Path [ /home/keith/Downloads/benchMT-master/testData/ ] does not exist, making...
GPU_ITEM: uuid: 67ad16af443d45ee9c9b63240f90137e
pcie_id: 08:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 0
BOINC Device number: None
card path: /sys/class/drm/card0/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: 6174520b81954271bbfd17e7958237e2
pcie_id: 0a:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 1
BOINC Device number: None
card path: /sys/class/drm/card1/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: 966bcc7e65ce44fdbcf2c0b1868e5cd1
pcie_id: 0b:00.0
model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 2
BOINC Device number: None
card path: /sys/class/drm/card2/device
hwmon path: None
Compute compatible: True
Energy compatible: True
keith@Serenity:~/Downloads/benchMT-master$
./benchMT --devmap 0:0,1:1,2:2
devmap: {0: 0, 1: 1, 2: 2}
Set specified gpu_devices: [0, 1, 2]
BOINC Home Path [ /home/boinc/BOINC/ ] does not exist
Please set the correct BOINC Home Path with the --boinc_home command line option
boinccmd [ /home/boinc/BOINC/boinccmd ] does not exist
Error in BOINC environment. Exiting...
./benchMT ./benchMT --boinc_home /home/keith/Desktop/BOINC/ --devmap 0:0,1:1,2:2
usage: benchMT [-h] [--about] [-y] [--cfg_file CFG_FILE] [--run_name RUN_NAME]
[--boinc_home BOINC_HOME] [--noBS] [--display_compact]
[--display_slots] [--num_repetitions NUM_REPETITIONS]
[--max_threads MAX_THREADS] [--max_gpus MAX_GPUS]
[--gpu_devices GPU_DEVICES] [--devmap DEVMAP] [--energy]
[--astropulse] [--std_signals] [--no_ref] [--force_ref]
[--purge_kernels] [--lsgpu] [--admin_mkdirs] [-d]
benchMT: error: unrecognized arguments: ./benchMT
So why is setting my boinc_home an invalid argument now?
It’s complaining about the second ./benchMT. Can you run with lsgpu and debug options?
Ah, silly me. Didn't see the extra one.
./benchMT --boinc_home /home/keith/Desktop/BOINC/ --devmap 0:0,1:1,2:2 --debug --lsgpu
Using python 3.6.9
mb_const.boinc_home: [/home/boinc/BOINC/]
mb_const.cpu_app_subdir: [APPS_CPU/]
mb_const.gpu_app_subdir: [APPS_GPU/]
mb_const.ref_app_subdir: [APPS_REF/]
mb_const.ref_results_subdir: [REF_RESULTS/]
mb_const.wu_subdir: [WU_test/]
mb_const.std_signal_subdir: [WU_std_signal/]
mb_const.testdata_subdir: [testData/]
mb_const.workdir_subdir: [workdir/]
mb_const.slots_subdir: [Slots/]
mb_const.command_line_filename: [BenchCFG]
mb_const.boinccmd: [boinccmd]
mb_const.template_file: [init_data.xml.template]
mb_const.wu_cmp: [rescmpv5_l]
mb_const.suspend_args: [['boinccmd --set_gpu_mode never 172800', 'boinccmd --set_run_mode never 172800']]
mb_const.resume_args: [['boinccmd --set_gpu_mode never 1', 'boinccmd --set_run_mode never 1']]
mb_const.activeWU: [work_unit.sah]
mb_const.activeAPWU: [in.dat]
mb_const.DEBUG: [True]
mb_const.noBS: [False]
mb_const.env: [<__main__.BENCH_ENV object at 0x7f4ed229b588>]
mb_const.card_root: [/sys/class/drm/]
mb_const.hwmon_sub: [hwmon/hwmon]
mb_const.cmd_lspci: [/usr/bin/lspci]
mb_const.cmd_lshw: [/usr/bin/lshw]
mb_const.cmd_lscpu: [/usr/bin/lscpu]
mb_const.cmd_clinfo: [/usr/bin/clinfo]
mb_const.cmd_time: [/usr/bin/time]
mb_const.cmd_lsb_release: [/usr/bin/lsb_release]
mb_const.cmd_nvidia_smi: [/usr/bin/nvidia-smi]
Initial app list
┌────┬────┬───┬────────────────────────────────────────────────────────────┬────────┬────────┬───────────┬────────┐
│Job#│Slot│xPU│app_name │ start │ finish │tot_time │ state │
│ │ │ │app_args │wu_name │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│0 │ NA │GPU│MBv8_8.22r3584_sse2_clAMD_HD5_x86_64-pc-linux-gnu │ NA │ NA │ NA │PENDING │
│ │ │ │-v 1 -instances_per_device 1 -sbs 2048 -period_iterations_nu│not assigned │
└────┴────┴───┴────────────────────────────────────────────────────────────┴──────────────────────────────────────┘
CFG_mode: yes None
CFG_mode: run_name None
CFG_mode: boinc_home None
CFG_mode: noBS None
CFG_mode: display_compact None
CFG_mode: display_slots None
CFG_mode: num_repetitions None
CFG_mode: max_threads None
CFG_mode: max_gpus None
CFG_mode: gpu_devices None
CFG_mode: devmap None
CFG_mode: std_signals None
CFG_mode: no_ref None
CFG_mode: force_ref None
CFG_mode: energy None
CFG_mode: astropulse None
ocl_device_name [GeForce RTX 2080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
cl_index: ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '0']
ocl_device_name [GeForce RTX 2080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
cl_index: ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '1']
ocl_device_name [GeForce GTX 1080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
cl_index: ['GeForce GTX 1080', 'OpenCL 1.2 CUDA', '2']
{'': ['GeForce GTX 1080', 'OpenCL 1.2 CUDA', '2']}
Found 3 GPUs
GPU: 08:00.0
['08:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. Device 2184', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search: []
Power reading for 08:00.0: 93.96 W
GPU: 0a:00.0
['0a:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. Device 2184', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search: []
Power reading for 0a:00.0: 166.11 W
GPU: 0b:00.0
['0b:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)', '\tSubsystem: eVga.com. Corp. GP104 [GeForce GTX 1080]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search: []
Power reading for 0b:00.0: 103.40 W
GPU_ITEM: uuid: 965baa7754644ac79453a94ba22d5b57
pcie_id: 08:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 0
BOINC Device number: None
card path: /sys/class/drm/card0/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: 728b2213f4b94b3293da860cd7b60774
pcie_id: 0a:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 1
BOINC Device number: None
card path: /sys/class/drm/card1/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: 70f2a1a30d124f2fba9ec5413f5dbbe0
pcie_id: 0b:00.0
model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 2
BOINC Device number: None
card path: /sys/class/drm/card2/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: 965baa7754644ac79453a94ba22d5b57
pcie_id: 08:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 0
BOINC Device number: None
card path: /sys/class/drm/card0/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: 728b2213f4b94b3293da860cd7b60774
pcie_id: 0a:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 1
BOINC Device number: None
card path: /sys/class/drm/card1/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: 70f2a1a30d124f2fba9ec5413f5dbbe0
pcie_id: 0b:00.0
model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 2
BOINC Device number: None
card path: /sys/class/drm/card2/device
hwmon path: None
Compute compatible: True
Energy compatible: True
devmap: {0: 0, 1: 1, 2: 2}
GPU_ITEM: uuid: 965baa7754644ac79453a94ba22d5b57
pcie_id: 08:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 0
BOINC Device number: 0
card path: /sys/class/drm/card0/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: 728b2213f4b94b3293da860cd7b60774
pcie_id: 0a:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 1
BOINC Device number: 1
card path: /sys/class/drm/card1/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: 70f2a1a30d124f2fba9ec5413f5dbbe0
pcie_id: 0b:00.0
model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: None
openCL Version: None
openCL Index: None
card number: 2
BOINC Device number: 2
card path: /sys/class/drm/card2/device
hwmon path: None
Compute compatible: True
Energy compatible: True
See power readings on the cards now. Currently running a mix of Einstein and Milkyway tasks since Seti is still fubared.
Not sure why it’s not setting the clinfo PCIe id, but thought of a new way to test it myself. I will put your clinfo output into a file and cat it instead executing clinfo. I will try this in the morning.
@KeithMyers I just found and fixed what I think the problem is. Can you try the latest on master with lsgpu and dubug options again?
keith@Serenity:~/Downloads/benchMT-master$ ./benchMT --boinc_home /home/keith/Desktop/BOINC/ --devmap 0:0,1:1,2:2 --debug --lsgpu
Using python 3.6.9
mb_const.boinc_home: [/home/boinc/BOINC/]
mb_const.cpu_app_subdir: [APPS_CPU/]
mb_const.gpu_app_subdir: [APPS_GPU/]
mb_const.ref_app_subdir: [APPS_REF/]
mb_const.ref_results_subdir: [REF_RESULTS/]
mb_const.wu_subdir: [WU_test/]
mb_const.std_signal_subdir: [WU_std_signal/]
mb_const.testdata_subdir: [testData/]
mb_const.workdir_subdir: [workdir/]
mb_const.slots_subdir: [Slots/]
mb_const.command_line_filename: [BenchCFG]
mb_const.boinccmd: [boinccmd]
mb_const.template_file: [init_data.xml.template]
mb_const.wu_cmp: [rescmpv5_l]
mb_const.suspend_args: [['boinccmd --set_gpu_mode never 172800', 'boinccmd --set_run_mode never 172800']]
mb_const.resume_args: [['boinccmd --set_gpu_mode never 1', 'boinccmd --set_run_mode never 1']]
mb_const.activeWU: [work_unit.sah]
mb_const.activeAPWU: [in.dat]
mb_const.DEBUG: [True]
mb_const.noBS: [False]
mb_const.env: [<__main__.BENCH_ENV object at 0x7f4f3286c550>]
mb_const.card_root: [/sys/class/drm/]
mb_const.hwmon_sub: [hwmon/hwmon]
mb_const.cmd_lspci: [/usr/bin/lspci]
mb_const.cmd_lshw: [/usr/bin/lshw]
mb_const.cmd_lscpu: [/usr/bin/lscpu]
mb_const.cmd_clinfo: [/usr/bin/clinfo]
mb_const.cmd_time: [/usr/bin/time]
mb_const.cmd_lsb_release: [/usr/bin/lsb_release]
mb_const.cmd_nvidia_smi: [/usr/bin/nvidia-smi]
benchMT workdir Path [ /home/keith/Downloads/benchMT-master/workdir/ ] does not exist, making...
TestData Path [ /home/keith/Downloads/benchMT-master/testData/ ] does not exist, making...
Initial app list
┌────┬────┬───┬────────────────────────────────────────────────────────────┬────────┬────────┬───────────┬────────┐
│Job#│Slot│xPU│app_name │ start │ finish │tot_time │ state │
│ │ │ │app_args │wu_name │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│0 │ NA │GPU│MBv8_8.22r3584_sse2_clAMD_HD5_x86_64-pc-linux-gnu │ NA │ NA │ NA │PENDING │
│ │ │ │-v 1 -instances_per_device 1 -sbs 2048 -period_iterations_nu│not assigned │
└────┴────┴───┴────────────────────────────────────────────────────────────┴──────────────────────────────────────┘
CFG_mode: yes None
CFG_mode: run_name None
CFG_mode: boinc_home None
CFG_mode: noBS None
CFG_mode: display_compact None
CFG_mode: display_slots None
CFG_mode: num_repetitions None
CFG_mode: max_threads None
CFG_mode: max_gpus None
CFG_mode: gpu_devices None
CFG_mode: devmap None
CFG_mode: std_signals None
CFG_mode: no_ref None
CFG_mode: force_ref None
CFG_mode: energy None
CFG_mode: astropulse None
ocl_device_name [GeForce RTX 2080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
ocl_pcie_id [08:00.0]
cl_index: ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '0']
ocl_device_name [GeForce RTX 2080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
ocl_pcie_id [0a:00.0]
cl_index: ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '1']
ocl_device_name [GeForce GTX 1080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
ocl_pcie_id [0b:00.0]
cl_index: ['GeForce GTX 1080', 'OpenCL 1.2 CUDA', '2']
{'08:00.0': ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '0'], '0a:00.0': ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '1'], '0b:00.0': ['GeForce GTX 1080', 'OpenCL 1.2 CUDA', '2']}
Found 3 GPUs
GPU: 08:00.0
['08:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. Device 2184', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search: []
Power reading for 08:00.0: 100.63 W
GPU: 0a:00.0
['0a:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. Device 2184', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search: []
Power reading for 0a:00.0: 107.63 W
GPU: 0b:00.0
['0b:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)', '\tSubsystem: eVga.com. Corp. GP104 [GeForce GTX 1080]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search: []
Power reading for 0b:00.0: 140.23 W
GPU_ITEM: uuid: 8ee7c608aabc41239c1094bd6569e3ec
pcie_id: 08:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: GeForce RTX 2080
openCL Version: OpenCL 1.2 CUDA
openCL Index: 0
card number: 0
BOINC Device number: None
card path: /sys/class/drm/card0/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: bf24108f54a946f080bb4fda43ddd5c6
pcie_id: 0a:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: GeForce RTX 2080
openCL Version: OpenCL 1.2 CUDA
openCL Index: 1
card number: 1
BOINC Device number: None
card path: /sys/class/drm/card1/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: cb6508fee15e4bcab3c6596008c884e9
pcie_id: 0b:00.0
model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: GeForce GTX 1080
openCL Version: OpenCL 1.2 CUDA
openCL Index: 2
card number: 2
BOINC Device number: None
card path: /sys/class/drm/card2/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: 8ee7c608aabc41239c1094bd6569e3ec
pcie_id: 08:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: GeForce RTX 2080
openCL Version: OpenCL 1.2 CUDA
openCL Index: 0
card number: 0
BOINC Device number: None
card path: /sys/class/drm/card0/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: bf24108f54a946f080bb4fda43ddd5c6
pcie_id: 0a:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: GeForce RTX 2080
openCL Version: OpenCL 1.2 CUDA
openCL Index: 1
card number: 1
BOINC Device number: None
card path: /sys/class/drm/card1/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: cb6508fee15e4bcab3c6596008c884e9
pcie_id: 0b:00.0
model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: GeForce GTX 1080
openCL Version: OpenCL 1.2 CUDA
openCL Index: 2
card number: 2
BOINC Device number: None
card path: /sys/class/drm/card2/device
hwmon path: None
Compute compatible: True
Energy compatible: True
devmap: {0: 0, 1: 1, 2: 2}
GPU_ITEM: uuid: 8ee7c608aabc41239c1094bd6569e3ec
pcie_id: 08:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: GeForce RTX 2080
openCL Version: OpenCL 1.2 CUDA
openCL Index: 0
card number: 0
BOINC Device number: 0
card path: /sys/class/drm/card0/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: bf24108f54a946f080bb4fda43ddd5c6
pcie_id: 0a:00.0
model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: GeForce RTX 2080
openCL Version: OpenCL 1.2 CUDA
openCL Index: 1
card number: 1
BOINC Device number: 1
card path: /sys/class/drm/card1/device
hwmon path: None
Compute compatible: True
Energy compatible: True
GPU_ITEM: uuid: cb6508fee15e4bcab3c6596008c884e9
pcie_id: 0b:00.0
model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
vendor: NVIDIA
driver: nvidiafb, nouveau, nvidia_drm, nvidia
openCL Device: GeForce GTX 1080
openCL Version: OpenCL 1.2 CUDA
openCL Index: 2
card number: 2
BOINC Device number: 2
card path: /sys/class/drm/card2/device
hwmon path: None
Compute compatible: True
Energy compatible: True
keith@Serenity:~/Downloads/benchMT-master$
Sorry, just got back on this thread. Been putting out a fire: Unaccountably I cannot run clinfo too many times on a linux system having AMD RX-570 with Einstein-at-home crunching. https://askubuntu.com/questions/1205335/where-to-report-bugs-involving-opencl-in-ubuntu
I am working on getting my MSboinc program to write out the correlation with the bus id's. On booting, it is supposed to read in the following (created by a python script) and replace the "-1" with the ID's it is using
<devmap>
<Num_GPUs>9</Num_GPUs>
<1>0 -1 01:00.0 NV GTX-1060-6GB</1>
<2>1 -1 02:00.0 NV GTX-1060-3GB</2>
<3>2 -1 03:00.0 NV GTX-1060-3GB</3>
<4>3 -1 04:00.0 NV P106-100</4>
<5>4 -1 05:00.0 NV GTX-1070</5>
<6>5 -1 08:00.0 NV P106-090</6>
<7>6 -1 0A:00.0 NV GTX-1060-3GB</7>
<8>7 -1 0B:00.0 NV GTX-1060-3GB</8>
<9>8 -1 0E:00.0 NV GTX-1060-3GB</9>
</devmap>
Is there anything I can do here on this thead?
@JStateson I definitely ran with your original issue report and did a whole lot more! Probably resulted in more email than is desirable... If you could just test the latest on master on your nvidia system with ./benchMT --lsgpu --debug
and post the results here. Just want to make sure it works fine. Then I will open a new help wanted issue for Beta testing and close this one. Thanks!
ran ok, results here http://stateson.net/images/h110btc_benchMT_2.txt Does not look as good as Keiths. Not sure how to format it. From a PC I used ftp to retrieve it from linux and then from the PC, I used FileZilla to put it my "junk" site.
ran ok, results here http://stateson.net/images/h110btc_benchMT_2.txt Does not look as good as Keiths. Not sure how to format it. From a PC I used ftp to retrieve it from linux and then from the PC, I used FileZilla to put it my "junk" site.
Thanks! I was still able to see what I needed to in the attached file. Looks like the compute and energy capability determination is working correctly for Nvidia.
I noticed that clinfo indicates that the Intel 530 has compute capability. Does BOINC also see it as compute capable? Seems like pcie_id is stored different for Intel. Can you provide the clinfo --raw
output for this card?
Let me know if you are ok with me adding you to the benchMT credits for lspci/clinfo debug.
Yes, BOINC detects the Intel iGPUs as capable of computing. There is a specific module in /client for detecting Intel gpus.
extern vector
Got python script "MakeTable.py" working for nvidia. The column starting with 1, 3, 4, 2 etc is the boinc id. I correlated this from coproc_info.xml. The first column 0,1,2, represents the sensor id of lm-sensors or the id in nvidia-smi. This still does not get the exact card when the name are identical but I am working on that.
<devmap>
<Num_GPUs>9</Num_GPUs>
<1>0 1 01:00.0 NV GTX-1060-6GB</1>
<2>1 3 02:00.0 NV GTX-1060-3GB</2>
<3>2 4 03:00.0 NV GTX-1060-3GB</3>
<4>3 2 04:00.0 NV P106-100</4>
<5>4 0 05:00.0 NV GTX-1070</5>
<6>5 8 08:00.0 NV P106-090</6>
<7>6 5 0A:00.0 NV GTX-1060-3GB</7>
<8>7 6 0B:00.0 NV GTX-1060-3GB</8>
<9>8 7 0E:00.0 NV GTX-1060-3GB</9>
</devmap>
Need to add the AMD ones but my AMD system is offline. The above "xml" file is to be read into my MSboinc program and then written to the event log so that it shows up on BoincTasks for every remote system.
Note that the "0" goes to the gtx 1070 and "8" goes to the slowest board, the P106-90
Update GitHub
https://github.com/JStateson/BoincTasks/tree/master/SystemdService
Yes, BOINC detects the Intel iGPUs as capable of computing. There is a specific module in /client for detecting Intel gpus. extern vector
intel_gpus; extern vector intel_gpu_opencls;
Seems like the pcie_id is represented differently than AMD or NV. Do you have an Intel card that you get clinfo --raw
output for?
sorry, hit wrong button the intel raw data is here https://stateson.net/images/intel_raw.txt seems I cannot delete the closing. Access should have been denied ACTUALLY I AM THE AUTHOR OF THE ISSUE!! This thread is getting too long, maybe time to start another?
sorry, hit wrong button the intel raw data is here https://stateson.net/images/intel_raw.txt seems I cannot delete the closing. Access should have been denied ACTUALLY I AM THE AUTHOR OF THE ISSUE!! This thread is getting too long, maybe time to start another?
I agree, it is getting confusing to follow the multiple lines of thoughts. I will raise issues for the separate items and leave this one closed.
Ubuntu 18.04 lscpi version unknown (no -version argument) from
to
removing the ATI and AMD fixed the problem of the first grep feeding "null" into the second