White-Raven / PowerEdge-shutup

shell ballgag for Dell servers, tested working with G11 and G12, G13 and G14 too but with conditions**.
151 stars 17 forks source link

PowerEdge T640 PSU fan speed abnormal #3

Closed senpng closed 2 years ago

senpng commented 2 years ago

Hello,I'm glad to see a glimmer of hope through this project.

I tried to downgrade Idrac, and control the fan through ipmitool, but that psu fan speed is 100%!

Have you ever encountered such a problem?

White-Raven commented 2 years ago

Hi! Well, to give you a bit of context, I'm not expert or anything, and I have been working on this just 'for fun' by myself.

Sadly, I don't have any server of 11th, 13th or 14th gen myself to do my tests, I pretty much rely on the feedbacks of people using my script on reddit, and the tests I can do myself on my servers of 12th gen... and afaik, you're the first one being on a TXXX (tower) series, so far it's been only about RXXX (rack) series servers.

Concerning your problem, you downgraded iDrac, but have you also had your bios downgraded at a version contemporary to the idrac version you went for? For example, for a T640, the last iDrac9 with IPMI raw fan control is [3.30.30.30], and the 'corresponding' bios, date-wise, would be [2.2.9], released 2 months prior... as the 1 month later version 2.2.11 was released in conjunction with the dreaded iDrac9 3.34.34.34. I'll even have you, quoting official Dell's statement, "be informed that BIOS has to be in tandem applied before for iDRAC updates to have appropriate compatibility." If you didn't, we might be onto something.

Word of safety though: any firmware or bios update should(must) be done while connected to an UPS unit, as a wall power dip or brown-out in the middle of such an operation can completely brick your hardware. (even bad luck can do that, even on UPS)

The issue might be that the lifecycle inventory fails to do the proper checkups because of discrepancies with the bios, or even with the PSU module(s) itself(themselves). "Over-cooling" is a common "error out" safety behavior.

PS: Sorry for the spam under, I saw your issue waking up, and the lack of coffee in my system is obvious.

senpng commented 2 years ago

Thank you for your reply.

I really didn't downgrade the BIOS, later, I try to downgrade the BIOS to see if it can solve the problem of power fan.

White-Raven commented 2 years ago

No problem!

If the iDrac version you downgraded to is the 3.30.30.30, then you can find the 'corresponding' here: https://www.dell.com/support/home/en-my/drivers/driversdetails?driverid=vm460&oscode=wst14&productcode=poweredge-t640

senpng commented 2 years ago

I've tried to downgrade Idrac and BIOS. Is the power fan still at full speed 😂

White-Raven commented 2 years ago

It just occurred to me though, did the PSU fan speed was already full blast before you tried to downgrade at first (as in an error-out safety measure because something went wrong), or is it actually that you wanted to use the script to control the fan speed of the PSU?

Because the script is meant to manage only the fan speed of system fans. Afaik these Dell PSU modules auto-manage themselves depending of their load/temp, and you can't control their own fan speed. It just happens that on a fair amount of servers these PSUs are also relatively quiet, and barely noticeable over the sound of system fans above 5% speed.

If your PSU is indeed abnormally stuck at full jet engine speed, you can try to see if you have errors in iDrac logs (like issues with Life Cycle Inventory or something), you can also try to reset the BMC through ipmi, with this: ipmitool mc reset cold and then review power and cooling related settings in bios and iDrac. The actual 100% PSU fan speed issue is a bit of a nightmare because many people faced that issues on many SKUs accross many generations of hardware, and every freaking fucking under the sun seem to have been a solution at some point. Some people just needed to change a random BIOS setting that doesn't seem to have anything to do with it, others replaced their motherboard or PSU modules.

White-Raven commented 2 years ago

Feel free to reopen if needed.