Ricks-Lab / benchMT

SETI multi-threaded MB/AP Benchmark Tool
GNU General Public License v3.0
3 stars 1 forks source link

Something changed in lspci and the grep is failing is my guess #5

Closed JStateson closed 4 years ago

JStateson commented 4 years ago

Ubuntu 18.04 lscpi version unknown (no -version argument) from

lspci | grep -E \"^.*(VGA|Display).*\[AMD\/ATI\].*$\" | grep -Eo \"^([0-9a-fA-F]+:[0-9a-fA-F]+.[0-9a-fA-F])\"

to


lspci | grep -E \"^.*(VGA|Display).*$\" | grep -Eo \"^([0-9a-fA-F]+:[0-9a-fA-F]+.[0-9a-fA-F])\"

removing the ATI and AMD fixed the problem of the first grep feeding "null" into the second

Ricks-Lab commented 4 years ago

@KeithMyers Thanks! Exactly what I needed. Looks like pcie ID is stored differently between NV and AMD. Should be an easy fix. I will let you know when I push an update with the changes.

KeithMyers commented 4 years ago

I would expect that to be the case since two different API's are being used. Boinc depends on the vendor API to pull the identifying information and capabilities of the detected cards. So AMD's API uses a different format compared to Nvidia's API to store the PCIe ID.

KeithMyers commented 4 years ago

// Detection of AMD/ATI GPUs // // Docs: // http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_CAL_Programming_Guide_v2.0%5B1%5D.pdf // ?? why don't they have HTML docs??

// NvAPI now provides an API for getting #cores :-) // But not FLOPs per clock cycle :-( // Anyway, don't use this for now because server code estimates FLOPS // based on compute capability, so we may as well do the same // See http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/ //

// OpenCL interfaces are documented here: // http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/ and // http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/

Ricks-Lab commented 4 years ago

@KeithMyers I just posted a fix, but not able to test. Can you give it a try?

@JStateson In this latest release, I added openCL Device Index, which may help in solving mapping problem. I noticed that coproc_info.xml has both device number and opencl_device_index. This looks promising.

Ricks-Lab commented 4 years ago

I just pushed another update. clinfo for NV reports back Bus ID and Slot ID, but the format of pcie ID is Bus:Device:Function, so I assumed Slot maps to Device. Only way to know for sure is to see if it matches to pcie ID found by lspci.

Seems from @JStateson post above, clinfo without --raw is giving Device Topology. It would be useful to see the output of --raw and default for nvidia.

KeithMyers commented 4 years ago

Doesn't seem to be picking up the OpenCL device number.

keith@Serenity:~/Downloads/benchMT-master$ ./benchMT --lsgpu
benchMT workdir Path [ /home/keith/Downloads/benchMT-master/workdir/ ] does not exist, making...
TestData Path [ /home/keith/Downloads/benchMT-master/testData/ ] does not exist, making...
GPU_ITEM: uuid: 67ad16af443d45ee9c9b63240f90137e
    pcie_id: 08:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 0
    BOINC Device number: None
    card path: /sys/class/drm/card0/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: 6174520b81954271bbfd17e7958237e2
    pcie_id: 0a:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 1
    BOINC Device number: None
    card path: /sys/class/drm/card1/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: 966bcc7e65ce44fdbcf2c0b1868e5cd1
    pcie_id: 0b:00.0
    model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 2
    BOINC Device number: None
    card path: /sys/class/drm/card2/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
keith@Serenity:~/Downloads/benchMT-master$
KeithMyers commented 4 years ago
./benchMT --devmap 0:0,1:1,2:2
devmap:  {0: 0, 1: 1, 2: 2}
Set specified gpu_devices:  [0, 1, 2]
BOINC Home Path [ /home/boinc/BOINC/ ] does not exist
Please set the correct BOINC Home Path with the --boinc_home command line option
boinccmd [ /home/boinc/BOINC/boinccmd ] does not exist
Error in BOINC environment.  Exiting...
./benchMT ./benchMT --boinc_home /home/keith/Desktop/BOINC/ --devmap 0:0,1:1,2:2
usage: benchMT [-h] [--about] [-y] [--cfg_file CFG_FILE] [--run_name RUN_NAME]
               [--boinc_home BOINC_HOME] [--noBS] [--display_compact]
               [--display_slots] [--num_repetitions NUM_REPETITIONS]
               [--max_threads MAX_THREADS] [--max_gpus MAX_GPUS]
               [--gpu_devices GPU_DEVICES] [--devmap DEVMAP] [--energy]
               [--astropulse] [--std_signals] [--no_ref] [--force_ref]
               [--purge_kernels] [--lsgpu] [--admin_mkdirs] [-d]
benchMT: error: unrecognized arguments: ./benchMT

So why is setting my boinc_home an invalid argument now?

Ricks-Lab commented 4 years ago

It’s complaining about the second ./benchMT. Can you run with lsgpu and debug options?

KeithMyers commented 4 years ago

Ah, silly me. Didn't see the extra one.

./benchMT --boinc_home /home/keith/Desktop/BOINC/ --devmap 0:0,1:1,2:2 --debug --lsgpu
Using python 3.6.9
mb_const.boinc_home: [/home/boinc/BOINC/]
mb_const.cpu_app_subdir: [APPS_CPU/]
mb_const.gpu_app_subdir: [APPS_GPU/]
mb_const.ref_app_subdir: [APPS_REF/]
mb_const.ref_results_subdir: [REF_RESULTS/]
mb_const.wu_subdir: [WU_test/]
mb_const.std_signal_subdir: [WU_std_signal/]
mb_const.testdata_subdir: [testData/]
mb_const.workdir_subdir: [workdir/]
mb_const.slots_subdir: [Slots/]
mb_const.command_line_filename: [BenchCFG]
mb_const.boinccmd: [boinccmd]
mb_const.template_file: [init_data.xml.template]
mb_const.wu_cmp: [rescmpv5_l]
mb_const.suspend_args: [['boinccmd --set_gpu_mode never 172800', 'boinccmd --set_run_mode never 172800']]
mb_const.resume_args: [['boinccmd --set_gpu_mode never 1', 'boinccmd --set_run_mode never 1']]
mb_const.activeWU: [work_unit.sah]
mb_const.activeAPWU: [in.dat]
mb_const.DEBUG: [True]
mb_const.noBS: [False]
mb_const.env: [<__main__.BENCH_ENV object at 0x7f4ed229b588>]
mb_const.card_root: [/sys/class/drm/]
mb_const.hwmon_sub: [hwmon/hwmon]
mb_const.cmd_lspci: [/usr/bin/lspci]
mb_const.cmd_lshw: [/usr/bin/lshw]
mb_const.cmd_lscpu: [/usr/bin/lscpu]
mb_const.cmd_clinfo: [/usr/bin/clinfo]
mb_const.cmd_time: [/usr/bin/time]
mb_const.cmd_lsb_release: [/usr/bin/lsb_release]
mb_const.cmd_nvidia_smi: [/usr/bin/nvidia-smi]

   Initial app list
┌────┬────┬───┬────────────────────────────────────────────────────────────┬────────┬────────┬───────────┬────────┐
│Job#│Slot│xPU│app_name                                                    │  start │ finish │tot_time   │ state  │
│    │    │   │app_args                                                    │wu_name                               │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│0   │ NA │GPU│MBv8_8.22r3584_sse2_clAMD_HD5_x86_64-pc-linux-gnu           │  NA    │  NA    │  NA       │PENDING │
│    │    │   │-v 1 -instances_per_device 1 -sbs 2048 -period_iterations_nu│not assigned                          │
└────┴────┴───┴────────────────────────────────────────────────────────────┴──────────────────────────────────────┘
CFG_mode: yes None
CFG_mode: run_name None
CFG_mode: boinc_home None
CFG_mode: noBS None
CFG_mode: display_compact None
CFG_mode: display_slots None
CFG_mode: num_repetitions None
CFG_mode: max_threads None
CFG_mode: max_gpus None
CFG_mode: gpu_devices None
CFG_mode: devmap None
CFG_mode: std_signals None
CFG_mode: no_ref None
CFG_mode: force_ref None
CFG_mode: energy None
CFG_mode: astropulse None
ocl_device_name [GeForce RTX 2080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
cl_index: ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '0']
ocl_device_name [GeForce RTX 2080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
cl_index: ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '1']
ocl_device_name [GeForce GTX 1080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
cl_index: ['GeForce GTX 1080', 'OpenCL 1.2 CUDA', '2']
{'': ['GeForce GTX 1080', 'OpenCL 1.2 CUDA', '2']}
Found 3 GPUs
GPU:  08:00.0
['08:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. Device 2184', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search:  []
Power reading for 08:00.0: 93.96 W
GPU:  0a:00.0
['0a:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. Device 2184', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search:  []
Power reading for 0a:00.0: 166.11 W
GPU:  0b:00.0
['0b:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)', '\tSubsystem: eVga.com. Corp. GP104 [GeForce GTX 1080]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search:  []
Power reading for 0b:00.0: 103.40 W
GPU_ITEM: uuid: 965baa7754644ac79453a94ba22d5b57
    pcie_id: 08:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 0
    BOINC Device number: None
    card path: /sys/class/drm/card0/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: 728b2213f4b94b3293da860cd7b60774
    pcie_id: 0a:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 1
    BOINC Device number: None
    card path: /sys/class/drm/card1/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: 70f2a1a30d124f2fba9ec5413f5dbbe0
    pcie_id: 0b:00.0
    model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 2
    BOINC Device number: None
    card path: /sys/class/drm/card2/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: 965baa7754644ac79453a94ba22d5b57
    pcie_id: 08:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 0
    BOINC Device number: None
    card path: /sys/class/drm/card0/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: 728b2213f4b94b3293da860cd7b60774
    pcie_id: 0a:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 1
    BOINC Device number: None
    card path: /sys/class/drm/card1/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: 70f2a1a30d124f2fba9ec5413f5dbbe0
    pcie_id: 0b:00.0
    model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 2
    BOINC Device number: None
    card path: /sys/class/drm/card2/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
devmap:  {0: 0, 1: 1, 2: 2}
GPU_ITEM: uuid: 965baa7754644ac79453a94ba22d5b57
    pcie_id: 08:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 0
    BOINC Device number: 0
    card path: /sys/class/drm/card0/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: 728b2213f4b94b3293da860cd7b60774
    pcie_id: 0a:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 1
    BOINC Device number: 1
    card path: /sys/class/drm/card1/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: 70f2a1a30d124f2fba9ec5413f5dbbe0
    pcie_id: 0b:00.0
    model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: None
    openCL Version: None
    openCL Index: None
    card number: 2
    BOINC Device number: 2
    card path: /sys/class/drm/card2/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
KeithMyers commented 4 years ago

See power readings on the cards now. Currently running a mix of Einstein and Milkyway tasks since Seti is still fubared.

Ricks-Lab commented 4 years ago

Not sure why it’s not setting the clinfo PCIe id, but thought of a new way to test it myself. I will put your clinfo output into a file and cat it instead executing clinfo. I will try this in the morning.

Ricks-Lab commented 4 years ago

@KeithMyers I just found and fixed what I think the problem is. Can you try the latest on master with lsgpu and dubug options again?

KeithMyers commented 4 years ago
keith@Serenity:~/Downloads/benchMT-master$ ./benchMT --boinc_home /home/keith/Desktop/BOINC/ --devmap 0:0,1:1,2:2 --debug --lsgpu
Using python 3.6.9
mb_const.boinc_home: [/home/boinc/BOINC/]
mb_const.cpu_app_subdir: [APPS_CPU/]
mb_const.gpu_app_subdir: [APPS_GPU/]
mb_const.ref_app_subdir: [APPS_REF/]
mb_const.ref_results_subdir: [REF_RESULTS/]
mb_const.wu_subdir: [WU_test/]
mb_const.std_signal_subdir: [WU_std_signal/]
mb_const.testdata_subdir: [testData/]
mb_const.workdir_subdir: [workdir/]
mb_const.slots_subdir: [Slots/]
mb_const.command_line_filename: [BenchCFG]
mb_const.boinccmd: [boinccmd]
mb_const.template_file: [init_data.xml.template]
mb_const.wu_cmp: [rescmpv5_l]
mb_const.suspend_args: [['boinccmd --set_gpu_mode never 172800', 'boinccmd --set_run_mode never 172800']]
mb_const.resume_args: [['boinccmd --set_gpu_mode never 1', 'boinccmd --set_run_mode never 1']]
mb_const.activeWU: [work_unit.sah]
mb_const.activeAPWU: [in.dat]
mb_const.DEBUG: [True]
mb_const.noBS: [False]
mb_const.env: [<__main__.BENCH_ENV object at 0x7f4f3286c550>]
mb_const.card_root: [/sys/class/drm/]
mb_const.hwmon_sub: [hwmon/hwmon]
mb_const.cmd_lspci: [/usr/bin/lspci]
mb_const.cmd_lshw: [/usr/bin/lshw]
mb_const.cmd_lscpu: [/usr/bin/lscpu]
mb_const.cmd_clinfo: [/usr/bin/clinfo]
mb_const.cmd_time: [/usr/bin/time]
mb_const.cmd_lsb_release: [/usr/bin/lsb_release]
mb_const.cmd_nvidia_smi: [/usr/bin/nvidia-smi]
benchMT workdir Path [ /home/keith/Downloads/benchMT-master/workdir/ ] does not exist, making...
TestData Path [ /home/keith/Downloads/benchMT-master/testData/ ] does not exist, making...

   Initial app list
┌────┬────┬───┬────────────────────────────────────────────────────────────┬────────┬────────┬───────────┬────────┐
│Job#│Slot│xPU│app_name                                                    │  start │ finish │tot_time   │ state  │
│    │    │   │app_args                                                    │wu_name                               │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│0   │ NA │GPU│MBv8_8.22r3584_sse2_clAMD_HD5_x86_64-pc-linux-gnu           │  NA    │  NA    │  NA       │PENDING │
│    │    │   │-v 1 -instances_per_device 1 -sbs 2048 -period_iterations_nu│not assigned                          │
└────┴────┴───┴────────────────────────────────────────────────────────────┴──────────────────────────────────────┘
CFG_mode: yes None
CFG_mode: run_name None
CFG_mode: boinc_home None
CFG_mode: noBS None
CFG_mode: display_compact None
CFG_mode: display_slots None
CFG_mode: num_repetitions None
CFG_mode: max_threads None
CFG_mode: max_gpus None
CFG_mode: gpu_devices None
CFG_mode: devmap None
CFG_mode: std_signals None
CFG_mode: no_ref None
CFG_mode: force_ref None
CFG_mode: energy None
CFG_mode: astropulse None
ocl_device_name [GeForce RTX 2080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
ocl_pcie_id [08:00.0]
cl_index: ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '0']
ocl_device_name [GeForce RTX 2080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
ocl_pcie_id [0a:00.0]
cl_index: ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '1']
ocl_device_name [GeForce GTX 1080]
ocl_device_version [OpenCL 1.2 CUDA]
ocl_pcie_id []
ocl_pcie_id [0b:00.0]
cl_index: ['GeForce GTX 1080', 'OpenCL 1.2 CUDA', '2']
{'08:00.0': ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '0'], '0a:00.0': ['GeForce RTX 2080', 'OpenCL 1.2 CUDA', '1'], '0b:00.0': ['GeForce GTX 1080', 'OpenCL 1.2 CUDA', '2']}
Found 3 GPUs
GPU:  08:00.0
['08:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. Device 2184', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search:  []
Power reading for 08:00.0: 100.63 W
GPU:  0a:00.0
['0a:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)', '\tSubsystem: eVga.com. Corp. Device 2184', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search:  []
Power reading for 0a:00.0: 107.63 W
GPU:  0b:00.0
['0b:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)', '\tSubsystem: eVga.com. Corp. GP104 [GeForce GTX 1080]', '\tKernel driver in use: nvidia', '\tKernel modules: nvidiafb, nouveau, nvidia_drm, nvidia', '']
hw_file_search:  []
Power reading for 0b:00.0: 140.23 W
GPU_ITEM: uuid: 8ee7c608aabc41239c1094bd6569e3ec
    pcie_id: 08:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: GeForce RTX 2080
    openCL Version: OpenCL 1.2 CUDA
    openCL Index: 0
    card number: 0
    BOINC Device number: None
    card path: /sys/class/drm/card0/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: bf24108f54a946f080bb4fda43ddd5c6
    pcie_id: 0a:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: GeForce RTX 2080
    openCL Version: OpenCL 1.2 CUDA
    openCL Index: 1
    card number: 1
    BOINC Device number: None
    card path: /sys/class/drm/card1/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: cb6508fee15e4bcab3c6596008c884e9
    pcie_id: 0b:00.0
    model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: GeForce GTX 1080
    openCL Version: OpenCL 1.2 CUDA
    openCL Index: 2
    card number: 2
    BOINC Device number: None
    card path: /sys/class/drm/card2/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: 8ee7c608aabc41239c1094bd6569e3ec
    pcie_id: 08:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: GeForce RTX 2080
    openCL Version: OpenCL 1.2 CUDA
    openCL Index: 0
    card number: 0
    BOINC Device number: None
    card path: /sys/class/drm/card0/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: bf24108f54a946f080bb4fda43ddd5c6
    pcie_id: 0a:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: GeForce RTX 2080
    openCL Version: OpenCL 1.2 CUDA
    openCL Index: 1
    card number: 1
    BOINC Device number: None
    card path: /sys/class/drm/card1/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: cb6508fee15e4bcab3c6596008c884e9
    pcie_id: 0b:00.0
    model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: GeForce GTX 1080
    openCL Version: OpenCL 1.2 CUDA
    openCL Index: 2
    card number: 2
    BOINC Device number: None
    card path: /sys/class/drm/card2/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
devmap:  {0: 0, 1: 1, 2: 2}
GPU_ITEM: uuid: 8ee7c608aabc41239c1094bd6569e3ec
    pcie_id: 08:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: GeForce RTX 2080
    openCL Version: OpenCL 1.2 CUDA
    openCL Index: 0
    card number: 0
    BOINC Device number: 0
    card path: /sys/class/drm/card0/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: bf24108f54a946f080bb4fda43ddd5c6
    pcie_id: 0a:00.0
    model: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: GeForce RTX 2080
    openCL Version: OpenCL 1.2 CUDA
    openCL Index: 1
    card number: 1
    BOINC Device number: 1
    card path: /sys/class/drm/card1/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
GPU_ITEM: uuid: cb6508fee15e4bcab3c6596008c884e9
    pcie_id: 0b:00.0
    model: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
    vendor: NVIDIA
    driver: nvidiafb, nouveau, nvidia_drm, nvidia
    openCL Device: GeForce GTX 1080
    openCL Version: OpenCL 1.2 CUDA
    openCL Index: 2
    card number: 2
    BOINC Device number: 2
    card path: /sys/class/drm/card2/device
    hwmon path: None
    Compute compatible: True
    Energy compatible: True
keith@Serenity:~/Downloads/benchMT-master$
JStateson commented 4 years ago

Sorry, just got back on this thread. Been putting out a fire: Unaccountably I cannot run clinfo too many times on a linux system having AMD RX-570 with Einstein-at-home crunching. https://askubuntu.com/questions/1205335/where-to-report-bugs-involving-opencl-in-ubuntu

I am working on getting my MSboinc program to write out the correlation with the bus id's. On booting, it is supposed to read in the following (created by a python script) and replace the "-1" with the ID's it is using

<devmap>
<Num_GPUs>9</Num_GPUs>
<1>0 -1 01:00.0 NV GTX-1060-6GB</1>
<2>1 -1 02:00.0 NV GTX-1060-3GB</2>
<3>2 -1 03:00.0 NV GTX-1060-3GB</3>
<4>3 -1 04:00.0 NV P106-100</4>
<5>4 -1 05:00.0 NV GTX-1070</5>
<6>5 -1 08:00.0 NV P106-090</6>
<7>6 -1 0A:00.0 NV GTX-1060-3GB</7>
<8>7 -1 0B:00.0 NV GTX-1060-3GB</8>
<9>8 -1 0E:00.0 NV GTX-1060-3GB</9>
</devmap>

Is there anything I can do here on this thead?

Ricks-Lab commented 4 years ago

@JStateson I definitely ran with your original issue report and did a whole lot more! Probably resulted in more email than is desirable... If you could just test the latest on master on your nvidia system with ./benchMT --lsgpu --debug and post the results here. Just want to make sure it works fine. Then I will open a new help wanted issue for Beta testing and close this one. Thanks!

JStateson commented 4 years ago

ran ok, results here http://stateson.net/images/h110btc_benchMT_2.txt Does not look as good as Keiths. Not sure how to format it. From a PC I used ftp to retrieve it from linux and then from the PC, I used FileZilla to put it my "junk" site.

Ricks-Lab commented 4 years ago

ran ok, results here http://stateson.net/images/h110btc_benchMT_2.txt Does not look as good as Keiths. Not sure how to format it. From a PC I used ftp to retrieve it from linux and then from the PC, I used FileZilla to put it my "junk" site.

Thanks! I was still able to see what I needed to in the attached file. Looks like the compute and energy capability determination is working correctly for Nvidia.

I noticed that clinfo indicates that the Intel 530 has compute capability. Does BOINC also see it as compute capable? Seems like pcie_id is stored different for Intel. Can you provide the clinfo --raw output for this card?

Let me know if you are ok with me adding you to the benchMT credits for lspci/clinfo debug.

KeithMyers commented 4 years ago

Yes, BOINC detects the Intel iGPUs as capable of computing. There is a specific module in /client for detecting Intel gpus. extern vector intel_gpus; extern vector intel_gpu_opencls;

JStateson commented 4 years ago

Got python script "MakeTable.py" working for nvidia. The column starting with 1, 3, 4, 2 etc is the boinc id. I correlated this from coproc_info.xml. The first column 0,1,2, represents the sensor id of lm-sensors or the id in nvidia-smi. This still does not get the exact card when the name are identical but I am working on that.

<devmap>
<Num_GPUs>9</Num_GPUs>
<1>0 1 01:00.0 NV GTX-1060-6GB</1>
<2>1 3 02:00.0 NV GTX-1060-3GB</2>
<3>2 4 03:00.0 NV GTX-1060-3GB</3>
<4>3 2 04:00.0 NV P106-100</4>
<5>4 0 05:00.0 NV GTX-1070</5>
<6>5 8 08:00.0 NV P106-090</6>
<7>6 5 0A:00.0 NV GTX-1060-3GB</7>
<8>7 6 0B:00.0 NV GTX-1060-3GB</8>
<9>8 7 0E:00.0 NV GTX-1060-3GB</9>
</devmap>

Need to add the AMD ones but my AMD system is offline. The above "xml" file is to be read into my MSboinc program and then written to the event log so that it shows up on BoincTasks for every remote system.
Note that the "0" goes to the gtx 1070 and "8" goes to the slowest board, the P106-90 Update GitHub https://github.com/JStateson/BoincTasks/tree/master/SystemdService

Ricks-Lab commented 4 years ago

Yes, BOINC detects the Intel iGPUs as capable of computing. There is a specific module in /client for detecting Intel gpus. extern vector intel_gpus; extern vector intel_gpu_opencls;

Seems like the pcie_id is represented differently than AMD or NV. Do you have an Intel card that you get clinfo --raw output for?

JStateson commented 4 years ago

sorry, hit wrong button the intel raw data is here https://stateson.net/images/intel_raw.txt seems I cannot delete the closing. Access should have been denied ACTUALLY I AM THE AUTHOR OF THE ISSUE!! This thread is getting too long, maybe time to start another?

Ricks-Lab commented 4 years ago

sorry, hit wrong button the intel raw data is here https://stateson.net/images/intel_raw.txt seems I cannot delete the closing. Access should have been denied ACTUALLY I AM THE AUTHOR OF THE ISSUE!! This thread is getting too long, maybe time to start another?

I agree, it is getting confusing to follow the multiple lines of thoughts. I will raise issues for the separate items and leave this one closed.