facebook / fboss

Facebook Open Switching System Software for controlling network switches.
Other
860 stars 295 forks source link

fan_service: Adapt to one fan rotor fail case #244

Open QiuyunXie opened 3 days ago

QiuyunXie commented 3 days ago

Description

As thermal's design, when only one fan rotor fails, the Inlet Sensor PWM should run based on "failUpTable" or "failDownTable". While setting one fan rotor to fail(not present), the code always runs based on "normalUpTable" & "normalDownTable" instead of "failUpTable" & "failDownTable". It's due to the "numFanFailed_" is always 0 in sensor PWM calculation process. So I move the "STEP 1" and "STEP 2" code sections to after "STEP 3". Accordingly, I modify the value of "pwmBoostOnNumDeadFan" in fan_service.json file to 2. When the number of failed fan rotors >= 2, then fan fail boost mode will be enabled.

Test log

1. Original issue log:****

I0925 17:10:38.158401 41360 ControlLogic.cpp:571] Processing Sensors ... E0925 17:10:38.158408 41360 ControlLogic.cpp:238] SMB_U77_INLET_LEFT_BOT_LM75_TEMP: Sensor read value (after scaling) is 29.875** I0925 17:10:38.158416 41360 ControlLogic.cpp:220] SMB_U77_INLET_LEFT_BOT_LM75_TEMP: Calculated PWM is 25** E0925 17:10:38.158422 41360 ControlLogic.cpp:238] CPU_UNCORE_TEMP: Sensor read value (after scaling) is 46 V0925 17:10:38.158434 41360 PidLogic.cpp:40] Measurement: 46, Error: -46, Last PWM: 50, New PWM: 0 I0925 17:10:38.158438 41360 ControlLogic.cpp:220] CPU_UNCORE_TEMP: Calculated PWM is 0 I0925 17:10:38.158443 41360 ControlLogic.cpp:575] Processing Optics ... I0925 17:10:38.158447 41360 ControlLogic.cpp:579] Processing Fans ... I0925 17:10:38.158475 41360 ControlLogic.cpp:368] FANTRAY1_FAN1: is absent in the host (through sysfs) E0925 17:10:38.158482 41360 ControlLogic.cpp:600] fan FANTRAY1_FAN1 : rpm 0 is below the minimum value 1500 I0925 17:10:38.158494 41360 ControlLogic.cpp:365] FANTRAY1_FAN2: is present in the host (through sysfs) I0925 17:10:38.171002 41360 ControlLogic.cpp:168] FANTRAY1_FAN2: RPM read is 9637 I0925 17:10:38.171015 41360 ControlLogic.cpp:365] FANTRAY1_FAN3: is present in the host (through sysfs) I0925 17:10:38.171033 41360 ControlLogic.cpp:168] FANTRAY1_FAN3: RPM read is 7992 I0925 17:10:38.171043 41360 ControlLogic.cpp:365] FANTRAY1_FAN4: is present in the host (through sysfs) I0925 17:10:38.171059 41360 ControlLogic.cpp:168] FANTRAY1_FAN4: RPM read is 9830 I0925 17:10:38.171070 41360 ControlLogic.cpp:365] FANTRAY1_FAN5: is present in the host (through sysfs) I0925 17:10:38.171086 41360 ControlLogic.cpp:168] FANTRAY1_FAN5: RPM read is 7992 I0925 17:10:38.171096 41360 ControlLogic.cpp:365] FANTRAY1_FAN6: is present in the host (through sysfs) I0925 17:10:38.171113 41360 ControlLogic.cpp:168] FANTRAY1_FAN6: RPM read is 9637 I0925 17:10:38.171123 41360 ControlLogic.cpp:365] FANTRAY1_FAN7: is present in the host (through sysfs) I0925 17:10:38.171139 41360 ControlLogic.cpp:168] FANTRAY1_FAN7: RPM read is 7992 I0925 17:10:38.171149 41360 ControlLogic.cpp:365] FANTRAY1_FAN8: is present in the host (through sysfs) I0925 17:10:38.171165 41360 ControlLogic.cpp:168] FANTRAY1_FAN8: RPM read is 9637 I0925 17:10:38.171176 41360 ControlLogic.cpp:365] FANTRAY2_FAN1: is present in the host (through sysfs) I0925 17:10:38.183668 41360 ControlLogic.cpp:168] FANTRAY2_FAN1: RPM read is 7992 I0925 17:10:38.183678 41360 ControlLogic.cpp:365] FANTRAY2_FAN2: is present in the host (through sysfs) I0925 17:10:38.183692 41360 ControlLogic.cpp:168] FANTRAY2_FAN2: RPM read is 9637 I0925 17:10:38.183701 41360 ControlLogic.cpp:365] FANTRAY2_FAN3: is present in the host (through sysfs) I0925 17:10:38.183716 41360 ControlLogic.cpp:168] FANTRAY2_FAN3: RPM read is 7992 I0925 17:10:38.183724 41360 ControlLogic.cpp:365] FANTRAY2_FAN4: is present in the host (through sysfs) I0925 17:10:38.183738 41360 ControlLogic.cpp:168] FANTRAY2_FAN4: RPM read is 9830 I0925 17:10:38.183748 41360 ControlLogic.cpp:365] FANTRAY2_FAN5: is present in the host (through sysfs) I0925 17:10:38.183763 41360 ControlLogic.cpp:168] FANTRAY2_FAN5: RPM read is 7992 I0925 17:10:38.183771 41360 ControlLogic.cpp:365] FANTRAY2_FAN6: is present in the host (through sysfs) I0925 17:10:38.183786 41360 ControlLogic.cpp:168] FANTRAY2_FAN6: RPM read is 9637 I0925 17:10:38.183795 41360 ControlLogic.cpp:365] FANTRAY2_FAN7: is present in the host (through sysfs) I0925 17:10:38.183810 41360 ControlLogic.cpp:168] FANTRAY2_FAN7: RPM read is 7992 I0925 17:10:38.183819 41360 ControlLogic.cpp:365] FANTRAY2_FAN8: is present in the host (through sysfs) I0925 17:10:38.183834 41360 ControlLogic.cpp:168] FANTRAY2_FAN8: RPM read is 9637 I0925 17:10:38.183839 41360 ControlLogic.cpp:631] Boost mode enabled for optics update missing for 1727255438s I0925 17:10:38.183845 41360 ControlLogic.cpp:500] zone1: Components: SMB_U77_INLET_LEFT_BOT_LM75_TEMP,CPU_UNCORE_TEMP,qsfp_group_1. Aggregation Type: ZONE_TYPE_MAX. Aggregate PWM is 75.

2. Fixed one fan rotor fail log:****

I0925 17:12:09.247700 41417 ControlLogic.cpp:571] Processing Fans ... I0925 17:12:09.247730 41417 ControlLogic.cpp:368] FANTRAY1_FAN1: is absent in the host (through sysfs) E0925 17:12:09.247741 41417 ControlLogic.cpp:592] fan FANTRAY1_FAN1 : rpm 0 is below the minimum value 1500 I0925 17:12:09.247755 41417 ControlLogic.cpp:365] FANTRAY1_FAN2: is present in the host (through sysfs) I0925 17:12:09.260165 41417 ControlLogic.cpp:168] FANTRAY1_FAN2: RPM read is 9830 I0925 17:12:09.260181 41417 ControlLogic.cpp:365] FANTRAY1_FAN3: is present in the host (through sysfs) I0925 17:12:09.260202 41417 ControlLogic.cpp:168] FANTRAY1_FAN3: RPM read is 7992 I0925 17:12:09.260215 41417 ControlLogic.cpp:365] FANTRAY1_FAN4: is present in the host (through sysfs) I0925 17:12:09.260234 41417 ControlLogic.cpp:168] FANTRAY1_FAN4: RPM read is 9830 I0925 17:12:09.260246 41417 ControlLogic.cpp:365] FANTRAY1_FAN5: is present in the host (through sysfs) I0925 17:12:09.260265 41417 ControlLogic.cpp:168] FANTRAY1_FAN5: RPM read is 7992 I0925 17:12:09.260278 41417 ControlLogic.cpp:365] FANTRAY1_FAN6: is present in the host (through sysfs) I0925 17:12:09.260297 41417 ControlLogic.cpp:168] FANTRAY1_FAN6: RPM read is 9830 I0925 17:12:09.260313 41417 ControlLogic.cpp:365] FANTRAY1_FAN7: is present in the host (through sysfs) I0925 17:12:09.260334 41417 ControlLogic.cpp:168] FANTRAY1_FAN7: RPM read is 7992 I0925 17:12:09.260346 41417 ControlLogic.cpp:365] FANTRAY1_FAN8: is present in the host (through sysfs) I0925 17:12:09.260365 41417 ControlLogic.cpp:168] FANTRAY1_FAN8: RPM read is 9637 I0925 17:12:09.260377 41417 ControlLogic.cpp:365] FANTRAY2_FAN1: is present in the host (through sysfs) I0925 17:12:09.272874 41417 ControlLogic.cpp:168] FANTRAY2_FAN1: RPM read is 7992 I0925 17:12:09.272888 41417 ControlLogic.cpp:365] FANTRAY2_FAN2: is present in the host (through sysfs) I0925 17:12:09.272907 41417 ControlLogic.cpp:168] FANTRAY2_FAN2: RPM read is 9830 I0925 17:12:09.272921 41417 ControlLogic.cpp:365] FANTRAY2_FAN3: is present in the host (through sysfs) I0925 17:12:09.272939 41417 ControlLogic.cpp:168] FANTRAY2_FAN3: RPM read is 7992 I0925 17:12:09.272952 41417 ControlLogic.cpp:365] FANTRAY2_FAN4: is present in the host (through sysfs) I0925 17:12:09.272970 41417 ControlLogic.cpp:168] FANTRAY2_FAN4: RPM read is 9830 I0925 17:12:09.272982 41417 ControlLogic.cpp:365] FANTRAY2_FAN5: is present in the host (through sysfs) I0925 17:12:09.272999 41417 ControlLogic.cpp:168] FANTRAY2_FAN5: RPM read is 7992 I0925 17:12:09.273012 41417 ControlLogic.cpp:365] FANTRAY2_FAN6: is present in the host (through sysfs) I0925 17:12:09.273029 41417 ControlLogic.cpp:168] FANTRAY2_FAN6: RPM read is 9830 I0925 17:12:09.273041 41417 ControlLogic.cpp:365] FANTRAY2_FAN7: is present in the host (through sysfs) I0925 17:12:09.273058 41417 ControlLogic.cpp:168] FANTRAY2_FAN7: RPM read is 7992 I0925 17:12:09.273069 41417 ControlLogic.cpp:365] FANTRAY2_FAN8: is present in the host (through sysfs) I0925 17:12:09.273086 41417 ControlLogic.cpp:168] FANTRAY2_FAN8: RPM read is 9637 I0925 17:12:09.273091 41417 ControlLogic.cpp:617] Processing Sensors ... E0925 17:12:09.273099 41417 ControlLogic.cpp:238] SMB_U77_INLET_LEFT_BOT_LM75_TEMP: Sensor read value (after scaling) is 29.75** I0925 17:12:09.273108 41417 ControlLogic.cpp:220] SMB_U77_INLET_LEFT_BOT_LM75_TEMP: Calculated PWM is 30** E0925 17:12:09.273114 41417 ControlLogic.cpp:238] CPU_UNCORE_TEMP: Sensor read value (after scaling) is 46 V0925 17:12:09.273128 41417 PidLogic.cpp:40] Measurement: 46, Error: -46, Last PWM: 50, New PWM: 0 I0925 17:12:09.273133 41417 ControlLogic.cpp:220] CPU_UNCORE_TEMP: Calculated PWM is 0 I0925 17:12:09.273139 41417 ControlLogic.cpp:621] Processing Optics ... I0925 17:12:09.273146 41417 ControlLogic.cpp:631] Boost mode enabled for optics update missing for 1727255529s I0925 17:12:09.273153 41417 ControlLogic.cpp:500] zone1: Components: SMB_U77_INLET_LEFT_BOT_LM75_TEMP,CPU_UNCORE_TEMP,qsfp_group_1. Aggregation Type: ZONE_TYPE_MAX. Aggregate PWM is 75.

3. Two fan rotors fail log:****

I0925 17:31:00.360647 41492 ControlLogic.cpp:571] Processing Fans ... I0925 17:31:00.360667 41492 ControlLogic.cpp:368] FANTRAY1_FAN1: is absent in the host (through sysfs) E0925 17:31:00.360674 41492 ControlLogic.cpp:592] fan FANTRAY1_FAN1 : rpm 0 is below the minimum value 1500 I0925 17:31:00.360683 41492 ControlLogic.cpp:368] FANTRAY1_FAN2: is absent in the host (through sysfs) E0925 17:31:00.360689 41492 ControlLogic.cpp:592] fan FANTRAY1_FAN2 : rpm 0 is below the minimum value 1500 I0925 17:31:00.360699 41492 ControlLogic.cpp:365] FANTRAY1_FAN3: is present in the host (through sysfs) I0925 17:31:00.360722 41492 ControlLogic.cpp:168] FANTRAY1_FAN3: RPM read is 7992 I0925 17:31:00.360732 41492 ControlLogic.cpp:365] FANTRAY1_FAN4: is present in the host (through sysfs) I0925 17:31:00.360747 41492 ControlLogic.cpp:168] FANTRAY1_FAN4: RPM read is 9830 I0925 17:31:00.360757 41492 ControlLogic.cpp:365] FANTRAY1_FAN5: is present in the host (through sysfs) I0925 17:31:00.360771 41492 ControlLogic.cpp:168] FANTRAY1_FAN5: RPM read is 7801 I0925 17:31:00.360781 41492 ControlLogic.cpp:365] FANTRAY1_FAN6: is present in the host (through sysfs) I0925 17:31:00.360796 41492 ControlLogic.cpp:168] FANTRAY1_FAN6: RPM read is 9637 I0925 17:31:00.360805 41492 ControlLogic.cpp:365] FANTRAY1_FAN7: is present in the host (through sysfs) I0925 17:31:00.360819 41492 ControlLogic.cpp:168] FANTRAY1_FAN7: RPM read is 7992 I0925 17:31:00.360828 41492 ControlLogic.cpp:365] FANTRAY1_FAN8: is present in the host (through sysfs) I0925 17:31:00.360843 41492 ControlLogic.cpp:168] FANTRAY1_FAN8: RPM read is 9830 I0925 17:31:00.360853 41492 ControlLogic.cpp:365] FANTRAY2_FAN1: is present in the host (through sysfs) I0925 17:31:00.360869 41492 ControlLogic.cpp:168] FANTRAY2_FAN1: RPM read is 7992 I0925 17:31:00.360878 41492 ControlLogic.cpp:365] FANTRAY2_FAN2: is present in the host (through sysfs) I0925 17:31:00.360893 41492 ControlLogic.cpp:168] FANTRAY2_FAN2: RPM read is 9830 I0925 17:31:00.360902 41492 ControlLogic.cpp:365] FANTRAY2_FAN3: is present in the host (through sysfs) I0925 17:31:00.360917 41492 ControlLogic.cpp:168] FANTRAY2_FAN3: RPM read is 7992 I0925 17:31:00.360926 41492 ControlLogic.cpp:365] FANTRAY2_FAN4: is present in the host (through sysfs) I0925 17:31:00.360941 41492 ControlLogic.cpp:168] FANTRAY2_FAN4: RPM read is 9830 I0925 17:31:00.360950 41492 ControlLogic.cpp:365] FANTRAY2_FAN5: is present in the host (through sysfs) I0925 17:31:00.360964 41492 ControlLogic.cpp:168] FANTRAY2_FAN5: RPM read is 7801 I0925 17:31:00.360974 41492 ControlLogic.cpp:365] FANTRAY2_FAN6: is present in the host (through sysfs) I0925 17:31:00.360988 41492 ControlLogic.cpp:168] FANTRAY2_FAN6: RPM read is 9637 I0925 17:31:00.360998 41492 ControlLogic.cpp:365] FANTRAY2_FAN7: is present in the host (through sysfs) I0925 17:31:00.361012 41492 ControlLogic.cpp:168] FANTRAY2_FAN7: RPM read is 7992 I0925 17:31:00.361021 41492 ControlLogic.cpp:365] FANTRAY2_FAN8: is present in the host (through sysfs) I0925 17:31:00.361036 41492 ControlLogic.cpp:168] FANTRAY2_FAN8: RPM read is 9830 I0925 17:31:00.361041 41492 ControlLogic.cpp:617] Processing Sensors ... E0925 17:31:00.361046 41492 ControlLogic.cpp:238] SMB_U77_INLET_LEFT_BOT_LM75_TEMP: Sensor read value (after scaling) is 30 I0925 17:31:00.361053 41492 ControlLogic.cpp:220] SMB_U77_INLET_LEFT_BOT_LM75_TEMP: Calculated PWM is 30 E0925 17:31:00.361058 41492 ControlLogic.cpp:238] CPU_UNCORE_TEMP: Sensor read value (after scaling) is 47 V0925 17:31:00.361064 41492 PidLogic.cpp:40] Measurement: 47, Error: -47, Last PWM: 75, New PWM: 0 I0925 17:31:00.361068 41492 ControlLogic.cpp:220] CPU_UNCORE_TEMP: Calculated PWM is 0 I0925 17:31:00.361073 41492 ControlLogic.cpp:621] Processing Optics ... I0925 17:31:00.361077 41492 ControlLogic.cpp:631] Boost mode enabled for optics update missing for 1727256660s I0925 17:31:00.361082 41492 ControlLogic.cpp:638] Boost mode enabled for 2 fan failures I0925 17:31:00.361088 41492 ControlLogic.cpp:500] zone1: Components: SMB_U77_INLET_LEFT_BOT_LM75_TEMP,CPU_UNCORE_TEMP,qsfp_group_1. Aggregation Type: ZONE_TYPE_MAX. Aggregate PWM is 75.

facebook-github-bot commented 1 day ago

@mikechoifb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.