Eraden / amdgpud

MIT License
195 stars 11 forks source link

Which temperature to read #22

Closed Eraden closed 3 years ago

Eraden commented 3 years ago

@BoostCookie please elaborate

BoostCookie commented 3 years ago

Let the user specify in the config file which temperature should be used. For example some cards have temp1_input, temp2_input, and temp3_input with temp1_label=edge, temp2_label=junction, temp3_label=mem. The user should be able to specify which of those should be used.

Optionally if people (and @Eraden) think it makes sense: Let the user specify multiple temp inputs and the temperature being used for the fan curve is then the max value of all specified temperatures.

Eraden commented 3 years ago

This sounds good for me. Want to do it? :)

BoostCookie commented 3 years ago

Can we please first talk about whether or not the config file should support different fan curves for different devices? As I've elaborated here, right now the program only really works if there is exactly one amd graphics card.

  1. We can make it very easy for ourselves and throw away the whole cards = ["card0"] argument and just always use the first hwmon we find with name=amdgpu. Users with multiple GPUs are rare and the name of the program suggests that it is only meant for amd cards.
  2. We can put a lot of effort into making this a general fan control program that supports any hwmon device with a pwm_input and allow multiple fancurves.
r15ch13 commented 3 years ago

Just yesterday I tweaked the fan curve and wondered why amdfand reported another temp than the KDE Widget I set up. (max_gpu_temp() returned temp2_input [junction]) :smile:

The junction temp2 is the hottest spot in the DIE and is sometimes around 10-20°C more than temp1 which is measured at the side of the DIE. (that's what googles says) It also changes quicker than temp1 and could lead to faster ramping up and down the fans. But I have to test that a bit more.

Making it configurable would be a good idea.

Eraden commented 3 years ago

My idea how we should approach problem:

Step 1.

Step 2.

This will be considered as version 1.1.0

BoostCookie commented 3 years ago

I agree with step 1.

Maybe instead of having temp_input as i8 we should have input_temp_file as String, but it probably doesn't matter.

Regarding step 2: I don't think a separate binary is necessary. Lets just add something like amdfand setup that helps you setup the config.

And I don't think we should ever support multiple AMD cards since the OS just scrambles them on every boot and we can't rely that next time the card with number 0 will still be the same one.

Eraden commented 3 years ago

And I don't think we should ever support multiple AMD cards since the OS just scrambles them on every boot and we can't rely that next time the card with number 0 will still be the same one.

This is why I want use device/device

BoostCookie commented 3 years ago

This is why I want use device/device

What do you mean? We don't need to use anything. Just look through /sys/class/drm/card[1-∞] and take the first one with name=amdgpu. You only had a problem because we hardcoded the card number in the config and on the next boot your nvidia card had that number. If we just always take the first amd card it works for everybody who has exactly one amd card.

Eraden commented 3 years ago

If user has only one card then we don't need to check anything.

If user has 2 AMD cards then i can check which card is it by checking value in those files:

/sys/class/drm/card0/device/hwmon/hwmon1/device/device /sys/class/drm/card1/device/hwmon/hwmon1/device/device

And if user don't want to use service for one of them or want other matrix for those then we can recognize it with this value.

Then etc will looks like this:

/etc/amdfand/fan.toml
/etc/amdfand/cards/card0.toml
/etc/amdfand/cards/card1.toml

Which card is which can be checked with following command:

glxinfo -B | grep Device

For example in my setup for 0x731f it's Device: AMD Radeon RX 5700 XT (NAVI10, DRM 3.40.0, 5.10.55-1-lts, LLVM 12.0.1) (0x731f)

BoostCookie commented 3 years ago

This would make it all much more complicated just to support the tiny percentage of people with multiple AMD cards. Also naming the configs card0.toml and card1.toml isn't a good idea because the numbering changes from boot to boot. And are the 4 digit hex codes in /sys/class/drm/card0/device/device actually unique? Maybe they are identical if a user has two identical cards.

Eraden commented 3 years ago

I asked people on reddit and asked GamingOnLinux if they have and could share statistics about multiple AMD GPU share

Eraden commented 3 years ago

To prevent creating cliffhanger I'll write it here.

I'm strongly against dropping multi card support.

Now, sorry for huge delay. I was thinking a lot about this and struggle internally to keep it or not.

I launched poll on Reddit and asked GamingOnLinux and Steam for Adobe information about used hardware.

Reddit as far as revealed about 9% it's not reliable due to small amount of users participating.

GamingOnLinux does not gather such informations.

Steam refused to share such information

I was considering asking Canonical but I need to become partner and I don't know how long it will take and if effort is worth it

BoostCookie commented 3 years ago

I'm strongly against dropping multi card support.

Then we need to find somebody who does have two identical cards so that they can tell us if /sys/class/drm/card0/device/device is identical for both cards or not. My guess is that it is.

If that is the case I suggest we use the pci path instead.

Eraden commented 3 years ago

For now only single card will be supported.

If no-one with 2 identical card will join then I'll most likely buy such setup