aristanetworks / sonic

Open source drivers and initialization library for Arista platforms running SONiC
GNU General Public License v2.0
26 stars 32 forks source link

[Chassis] thermalctld warning continously seen in syslog #111

Open arlakshm opened 1 week ago

arlakshm commented 1 week ago

The following log is continous seen in syslog

2024 Nov 16 05:07:30.655636 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 NIF changed too fast, from 57.945 to 70.473, please check your hardware
2024 Nov 16 05:08:31.042006 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 C changed too fast, from 57.013 to 71.27, please check your hardware
2024 Nov 16 05:08:31.043082 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 AVS changed too fast, from 55.068 to 68.338, please check your hardware
2024 Nov 16 05:08:31.044132 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 FAB changed too fast, from 59.473 to 72.81, please check your hardware
2024 Nov 16 05:08:31.045208 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 NIF changed too fast, from 59.0 to 72.27, please check your hardware
2024 Nov 16 05:09:30.658807 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 Back changed too fast, from 31.473 to 41.81, please check your hardware
2024 Nov 16 05:09:30.711657 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 Front changed too fast, from 28.135 to 38.608, please check your hardware
2024 Nov 16 05:09:30.712558 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 C changed too fast, from 57.743 to 72.473, please check your hardware
2024 Nov 16 05:09:30.713262 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 AVS changed too fast, from 55.743 to 69.203, please check your hardware
2024 Nov 16 05:09:30.714076 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 FAB changed too fast, from 60.27 to 73.81, please check your hardware
2024 Nov 16 05:09:30.714773 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 NIF changed too fast, from 59.945 to 73.203, please check your hardware
2024 Nov 16 05:10:30.999608 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 Back changed too fast, from 32.013 to 42.743, please check your hardware
2024 Nov 16 05:10:31.055381 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 Front changed too fast, from 28.675 to 39.27, please check your hardware
2024 Nov 16 05:10:31.055381 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 C changed too fast, from 58.27 to 72.743, please check your hardware
2024 Nov 16 05:10:31.055381 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 AVS changed too fast, from 55.81 to 69.945, please check your hardware
2024 Nov 16 05:10:31.055381 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 FAB changed too fast, from 60.945 to 74.203, please check your hardware
2024 Nov 16 05:10:31.055381 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 NIF changed too fast, from 60.405 to 73.608, please check your hardware
2024 Nov 16 05:10:31.226829 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 AVS changed too fast, from 69.945 to 59.878, please check your hardware
2024 Nov 16 05:11:30.613155 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 Back changed too fast, from 32.608 to 43.203, please check your hardware
2024 Nov 16 05:11:30.674600 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 Front changed too fast, from 29.068 to 39.338, please check your hardware
2024 Nov 16 05:11:30.674600 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 C changed too fast, from 58.878 to 72.203, please check your hardware
2024 Nov 16 05:11:30.685035 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 AVS changed too fast, from 56.675 to 68.675, please check your hardware
2024 Nov 16 05:11:30.692616 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 FAB changed too fast, from 61.608 to 73.608, please check your hardware
2024 Nov 16 05:11:30.693303 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 NIF changed too fast, from 61.068 to 73.0, please check your hardware
2024 Nov 16 05:12:31.007507 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 Back changed too fast, from 33.0 to 43.473, please check your hardware
2024 Nov 16 05:12:31.066017 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 C changed too fast, from 59.203 to 71.675, please check your hardware
2024 Nov 16 05:12:31.067258 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 AVS changed too fast, from 57.0 to 68.405, please check your hardware
2024 Nov 16 05:12:31.068417 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 FAB changed too fast, from 61.878 to 73.068, please check your hardware
2024 Nov 16 05:12:31.069523 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 NIF changed too fast, from 61.54 to 72.54, please check your hardware
2024 Nov 16 05:13:30.668450 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 C changed too fast, from 59.945 to 71.203, please check your hardware
2024 Nov 16 05:13:30.669185 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 AVS changed too fast, from 57.338 to 68.405, please check your hardware
2024 Nov 16 05:13:30.669999 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 FAB changed too fast, from 62.675 to 72.81, please check your hardware
2024 Nov 16 05:13:30.670809 str3-7808-sup-1 WARNING pmon#thermalctld: Temperature of Fap0 NIF changed too fast, from 62.203 to 72.27, please check your hardware

Chassis details

admin@str3-7808-sup-1:~$ show chassis module status
        Name        Description    Physical-Slot    Oper-Status    Admin-Status       Serial
------------  -----------------  ---------------  -------------  --------------  -----------
FABRIC-CARD0         7808R3-FM2               51         Online              up  FGN222105ZR
FABRIC-CARD1         7808R3-FM2               52         Online              up  FGN22210603
FABRIC-CARD2         7808R3-FM2               53         Online              up  FGN221703CR
FABRIC-CARD3         7808R3-FM2               54         Online              up  FGN222105YF
FABRIC-CARD4         7808R3-FM2               55         Online              up  FGN221901VG
FABRIC-CARD5         7808R3-FM2               56         Online              up  SGD21471178
  LINE-CARD0   7800R3A-36DM2-LC                3         Online              up  SGD22203294
  LINE-CARD1  7800R3AK-36DM2-LC                4         Online              up  SGD232207JY
  LINE-CARD2  7800R3AK-36DM2-LC                5         Online              up  SGD232204JZ
  LINE-CARD3            Unknown                6          Empty              up          N/A
  LINE-CARD4            Unknown                7          Empty              up          N/A
  LINE-CARD5            Unknown                8          Empty              up          N/A
  LINE-CARD6            Unknown                9          Empty              up          N/A
  LINE-CARD7            Unknown               10          Empty              up          N/A
 SUPERVISOR0     DCS-7800-SUP1A                1         Online              up  SGD22140009
admin@str3-7808-sup-1:~$
patrickmacarthur commented 5 days ago

Do you see any logs messages that look like this?

Failed to update thermal status for Front - TypeError("float() argument must be a string or a real number, not 'NoneType'")
arlakshm commented 4 days ago

Attached the techsupport. It has all the syslog since bootup

sonic_dump_str3-7808-sup-1_20241123_041750.tar.gz