[FEATURE] Force 100% fan speed at high temperatures

MithicSpirit commented 2 months ago

It would be nice if there was a "maximum acceptable temperature", and if it ever detects that the temperature is past that, it forcefully sets the fans to 100% until it goes back down, independent of the moving average. This would allow for having a relatively wide interval without fear of severe overheating if the CPU starts getting too hot.

Would you like to be involved in the development? Yes, though I'd need some pointers on how this should be configured (per-profile? global? naming?) if I'm going to be the one programming.

MithicSpirit commented 2 months ago

Here's a quick, untested implementation that uses a global maximum temperature.

diff --git a/fanctrl.py b/fanctrl.py
index 703e465..b7f8e0d 100644
--- a/fanctrl.py
+++ b/fanctrl.py
@@ -313,12 +313,15 @@ class Configuration:
     def getDefaultStrategy(self):
         return self.getStrategy("defaultStrategy")

     def getDischargingStrategy(self):
         return self.getStrategy("strategyOnDischarging")

+    def getMaxTemp(self):
+        return self.data["maxTemp"]
+

 class SocketController(ABC):
     @abstractmethod
     def startServerSocket(self, commandCallback=None):
         raise UnimplementedException()

@@ -613,13 +616,15 @@ class FanController:
     def run(self, debug=True):
         try:
             while True:
                 if self.active:
                     temp = self.getActualTemperature()
                     # update fan speed every "fanSpeedUpdateFrequency" seconds
-                    if self.timecount % self.getCurrentStrategy().fanSpeedUpdateFrequency == 0:
+                    if temp >= self.configuration.getMaxTemp():
+                        self.setSpeed(100)
+                    elif self.timecount % self.getCurrentStrategy().fanSpeedUpdateFrequency == 0:
                         self.adaptSpeed(temp)
                         self.timecount = 0

                     self.tempHistory.append(temp)

                     if debug:

leopoldhub commented 2 months ago

Hello,

Thanks for your feedback!

I believe that the feature you are asking for can be achieved with temperature curve configuration (e.g. temp >= 90, fans = 100). If the temperature ever goes above 90, the fans will be set to max after a maximum of [fanSpeedUpdateFrequency] seconds before going to 100%, and wait for the temperature to go below 90 before slowing down.

If this is not the case, could you give more explanation and use case examples?

MithicSpirit commented 2 months ago

If the temperature ever goes above 90, the fans will be set to max after a maximum of [fanSpeedUpdateFrequency] seconds before going to 100%, and wait for the temperature to go below 90 before slowing down.

From my experience, that doesn't seem to be the case. Even if the temperature goes higher than the maximum temperature, I can still hear the fan ramping up every couple seconds (fanSpeedUpdateFrequency) until it gets to the 100%. I believe this is because it doesn't just consider the current temperature, but rather the average of the temperature over the last several seconds (movingAverageInterval).

I have taken a closer look at the code, and there are a couple lines of code that are relevant that stand out to me. Here, it's taking the min of the current temperature and the average temperature, which means that it's slow on the rising edge and fast on the falling edge. However, the behavior I want is sort of the opposite—if it gets hotter than a certain point, it should ignore the average temperature and just go straight to 100% fans. The comment also does not seem to be accurately describing the behavior of the function.

https://github.com/TamtamHero/fw-fanctrl/blob/2a9828f91620c0e8746ecb6875887b34da7888e3/fanctrl.py#L580-L582

Svenum commented 2 months ago

You mean that you want to be able to set a "critical temperature" so that if one number is above this, the average is ignored?

MithicSpirit commented 2 months ago

You mean that you want to be able to set a "critical temperature" so that if one number is above this, the average is ignored?

Yeah, exactly! "Critical temperature" is a much better description, thanks.

leopoldhub commented 2 months ago

Hi, thanks for the clarification.

I am against having a global "critical temperature threshold", as this would apply to all strategies and could lead to unexpected behaviour when switching them (e.g. a critical temp of 50 for high performance strategy would remain when switching to a quieter one).

Also, the reactivity you want will come with the fans jumping from their planned speed to the max assigned speed, which can be surprising at times.

Having said that, I can suggest a "per strategy" implementation as follows:

In the speedCurve of a strategy, we could add an optional "critical" or "absolute" (name to be defined) field that does this for a specific temperature step and higher ones.

{
    ...
    "strategies": {
        "xxx": {
            "fanSpeedUpdateFrequency": 5,
            "movingAverageInterval": 40,
            "speedCurve": [
                { "temp": 0, "speed": 0 },
                ...
                { "temp": 85, "speed": 100, "critical": true }
            ]
        },
    ...
}

We could also consider modularity and make fanSpeedUpdateFrequecy/movingAverageInterval variable depending on the temperature steps.

{
    ...
    "strategies": {
        "xxx": {
            "speedCurve": [
                { "temp": 0, "speed": 0, "fanSpeedUpdateFrequency": 5, "movingAverageInterval": 40 },
                { "temp": 45, "speed": 0 },
                { "temp": 65, "speed": 25 },
                { "temp": 70, "speed": 35 },
                { "temp": 75, "speed": 80, "fanSpeedUpdateFrequency": 1, "movingAverageInterval": 1 },
                { "temp": 85, "speed": 100 }
            ]
        },
    ...
}

Would this meet your needs?

MithicSpirit commented 2 months ago

These could work, but they are a lot more complicated and especially the last one is not backwards compatible. I could try to work on them, but I'm not sure if there's demand for stuff like this. I do agree that it's probably better to not have it be global.

Maybe a cleaner solution that I believe is equivalent to the first one would be to have a (per-profile) option for a criticalTemp, and if the real temperature is greater than that, it behaves as if the movingAverageInterval is 1 (and maybe the fanSpeedUpdateFrequency as well?). This would effectively be the same as the first example, with a "critical": true for that particular temperature (which would also affect all higher temperatures).

leopoldhub commented 2 months ago

To do this we would need a 'critical' and a 'criticalTemp' field.

I think it would make more sense to have them in the existing curve, as in solution 1, to keep the temperature curve configuration consistent and centralized, as well as having a single additional field.

MithicSpirit commented 2 months ago

To do this we would need a 'critical' and a 'criticalTemp' field.

Why would we need two additional fields? I think just the critical temperature suffices, and the appropriate fan speed can be obtained from the speed curve as normal (though ignoring the moving average).

I think it would make more sense to have them in the existing curve, as in solution 1, to keep the temperature curve configuration consistent and centralized, as well as having a single additional field.

My issue with solution 1 is that it doesn't feel as intuitive. In particular, the behavior of setting "critical": true for multiple different points feels like it should do something (not sure what), even though, from your explanation, only the one with the lowest temperature matters.

Also, I think it makes more sense for the critical temperature to be stored together with the moving average interval rather than in the curve, as it affects the former rather than the latter.

leopoldhub commented 2 months ago

Yes, what you say makes sense.

The refacto and integration of these in the speed curve should be a separate topic indeed.

I'm still not quite convinced on the fact that it should be a separate feature, or that this is the way to do it, but go for it now and if we find a better way in the future, we might refactor it.

MithicSpirit commented 2 months ago

Here's another quick and dirty implementation. I'll try to do some testing and write docs over the next couple of days, but it might have to wait until the weekend.

diff --git a/fanctrl.py b/fanctrl.py
index 703e465..8c2798e 100644
--- a/fanctrl.py
+++ b/fanctrl.py
@@ -264,22 +264,24 @@ class SocketAlreadyRunningException(Exception):

 class Strategy:
     name = None
     fanSpeedUpdateFrequency = None
     movingAverageInterval = None
+    criticalTemp = None
     speedCurve = None

     def __init__(self, name, parameters):
         self.name = name
         self.fanSpeedUpdateFrequency = parameters["fanSpeedUpdateFrequency"]
         if self.fanSpeedUpdateFrequency is None or self.fanSpeedUpdateFrequency == "":
             self.fanSpeedUpdateFrequency = 5
         self.movingAverageInterval = parameters["movingAverageInterval"]
         if self.movingAverageInterval is None or self.movingAverageInterval == "":
             self.movingAverageInterval = 20
+        self.criticalTemp = parameters.get("criticalTemp")
         self.speedCurve = parameters["speedCurve"]

 class Configuration:
     path = None
     data = None
@@ -580,13 +582,15 @@ class FanController:
     def getEffectiveTemperature(self, currentTemp, timeInterval):
         # the moving average temperature count for 2/3 of the effective temperature
         return round(min(self.getMovingAverageTemperature(timeInterval), currentTemp), 1)

     def adaptSpeed(self, currentTemp):
         currentStrategy = self.getCurrentStrategy()
-        currentTemp = self.getEffectiveTemperature(currentTemp, currentStrategy.movingAverageInterval)
+        criticalTemp = currentStrategy.criticalTemp
+        if criticalTemp is not None and currentTemp < criticalTemp:
+            currentTemp = self.getEffectiveTemperature(currentTemp, currentStrategy.movingAverageInterval)
         minPoint = currentStrategy.speedCurve[0]
         maxPoint = currentStrategy.speedCurve[-1]
         for e in currentStrategy.speedCurve:
             if currentTemp > e["temp"]:
                 minPoint = e
             else:

TamtamHero commented 2 months ago

This would allow for having a relatively wide interval without fear of severe overheating if the CPU starts getting too hot.

I think this initial assumption is wrong. There is no "severe overheating" possible with modern CPUs, they simply shutdown before they are in danger. For Intel CPUs, the automatic cutoff is usually around 100-110C°. AMD does it too (https://superuser.com/a/129913). So unless I'm missing something, there's no reason to worry, and no reason to add this feature. For aggressive fan management, the current strategy system offers quite some possibilities already.

MithicSpirit commented 2 months ago

I think this initial assumption is wrong. There is no "severe overheating" possible with modern CPUs, they simply shutdown before they are in danger.

Hard shutdowns aren't really acceptable behavior for me. Obviously better than damaged hardware, but I'd really prefer it if my fans just kicked in instead.

TamtamHero commented 2 months ago

Ok, but my other point was that with an aggressive strategy using the current system, you will never reach such temperatures. I fail to see a scenario where your CPU could reach 100-110°C with fw-fanctrl running any decent strategy in the background, even the lazyest strategy. Even more, before kicking in the preventive shutdown, your kernel will also severely underclock the CPU until it goes back to a normal temperature, and there's no way it will reach 100-110°C running at, let's say, 400mHz. You can do the test yourself: disable fans by switching to the sleep strategy: fw-fanctrl sleep, then run stress-ng --cpu YOUR_NUMBER_OF_CPU_CORE --cpu-method matrixprod and see what happens. I just did that, and the CPU quickly got to 95°C then got underclocked to 2.2GHz which instantly cooled it to 75°C where it hoovered for minutes until I stopped the stressing tool. Without fan !

I hope I convinced you. Otherwise, let me know a scenario where I could fry my CPU (with the lazyest strategy running in the background, this time )

MithicSpirit commented 2 months ago

I'd also prefer the fans kicked in rather than thermal throttling that much, but feel free to close this issue and #83 if you're not interested. I've already patched this in locally.

TamtamHero / fw-fanctrl

[FEATURE] Force 100% fan speed at high temperatures #82