FoldingAtHome / fah-issues

49 stars 9 forks source link

Incorrect values assined for opencl-index #1244

Open bb30994 opened 6 years ago

bb30994 commented 6 years ago

https://foldingforum.org/viewtopic.php?f=61&t=30898

In a rather complicated situation with one unsupported GPU and two supported GPUs, the V7.5.1 client gets confused and tries to use the unsupported device rather than the two supported devices.

bb30994 commented 6 years ago

This seems to be a problem when there are two platforms installed: Intel and NVidia. (I have not seen examples where ATI is added to the mix.)

The simplest solution seems to be disallowing access to the Intel OpenCL driver -- which often is not installed anyway -- but I'm not sure we want to do that since we may decide support selected Intel GPUs at some future date.

bb30994 commented 6 years ago

As a convenience, it looks like we need to distribute 'clinfo' since it's often not included with the OS.

stoperro commented 4 years ago

I think I have similar issue. I would try to implement the fix myself, but likely the main issue is in the closed-source part...

17:58:54:          GPU 0: Bus:2 Slot:0 Func:0 NVIDIA:5 GM204 [GeForce GTX 980] 4612
17:58:54:          GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:5 GM204 [GeForce GTX 980] 4612
17:58:54:  CUDA Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:5.2 Driver:10.2
17:58:54:OpenCL Device 0: Platform:0 Device:0 Bus:2 Slot:0 Compute:1.2 Driver:442.74
17:58:54:OpenCL Device 1: Platform:1 Device:0 Bus:2 Slot:0 Compute:1.2 Driver:442.74
17:58:54:OpenCL Device 2: Platform:2 Device:0 Bus:NA Slot:NA Compute:1.2 Driver:20.19
17:58:54:  Win32 Service: false

Note, checking clinfo I have 3 opencl platforms:

  1. NVIDIA CUDA (my first dedicated GPU from MSI)
  2. NVIDIA CUDA (my second dedicated GPU very similar to above, but Founders Edition, different but similar HW/driver)
  3. Intel(R) OpenCL (has 2 devices: CPU and GPU)

For a long time I thought OpenCL has issue that my two almost identical cards report as 2 platforms, but after debugging OpenCL-ICD-Loader it turns out this is because internally the cards use different drivers (nvdisp.inf and nv_lei.inf), likely due to coming from different manufacturers.

So, setup is correct, yet I was not able to get second card working with FAHControl. After installation SW detected 2 cards, but misconfigured them, so second was failing with "bad workunits". I tried setting opencl-index, but this is not enough as it's 1 value but at least 2 are needed:

  1. Platform index
  2. Device index

Setting opencl-index tried changing device index - wrong, as device index should be always 0 in my setup based on System Info output.

My guess is that to make it work, either fah-client should be smarter to when enumerating platforms for slots, or allow to manually select platform for opencl in fah-ccontrol/config.xml.

shorttack commented 4 years ago

Presumed defect in FAHControl adding slots.

bb30994 commented 4 years ago

This is a similar case of a problem with the enumeration of GPUs. Starting from a simple CPU/GPU client config, add the connfigurtion for an Intel iGPU

< slot id='0' type=CPU> < /slot> < slot id='1' type=GPU> # Was NV with an active WU < Client-type= < other-PARM .... < /slot>

When the client is restarted, the new iGPU becomes the first GPU, moving the operational GPU to the end. The Config becomes something like this

< slot id='0' type=CPU> < /slot> < slot id='1' type=GPU> # becomes Intel but inherits the settings for the nVidia. < Client-type= < other-PARM .... < /slot> < slot id='2' type=GPU> # now NV no longer retains it's settings. < /slot>

PantherX commented 4 years ago

With the new method to identify GPUs, will the Slot settings be sticky to the new identification or not?

stoperro commented 4 years ago

Just checked out beta 7.6.17 and now both of my GPUs work (GTX 980 from two different vendors), so whatever the changes made they helped for my setup :) Looking at forum, people get issues with new betas where their Intel GPU is detected (yet it won't work, as no support for it yet), but not in my setup - while I have Intel GPU it's (correctly) not detected.

PantherX commented 4 years ago

Hiya @stoperro Please note that the build you're using (V7.6.17) is not a Beta client, it is a developmental build and has some issues which needs to be addressed before a public Beta is announced. The last stable version is V7.6.13 and it should be able to work correctly with your dual GPU setup 😄

stoperro commented 4 years ago

It didn't work on V7.6.13 as described in my previous post. I left it on for 3 days on V7.6.17 and everything works fine so far for me...

PantherX commented 4 years ago

@stoperro Humm... would it be possible for you to post your log files and experience on our Official Support site (https://foldingforum.org/index.php)? There are plenty of experience members there to help you out. Reason is that V7.6.13 should be able to support all valid GPU configuration and moving to the new GPU detection should support the same use cases. The change in detection is to expand back-end functionality and make future development easier.