Open nguyenduchoa37 opened 2 months ago
Hello,
Could you please share the Junos version and the model of card.
David
Hello,
Could you please share the Junos version and the model of card.
David
Hi.
I test with MX960 Junos: 20.4R3-S8.1, using MPC10E. But if this error appears on Grafana, which kind of this log ? And how many seconds this log will exist since that card is down on box?
I believe manually shutting down an MPC is not considered an error. If you want, I could provide you a command to simulate an HW error in your lab.
Yes, please share me that command. Anw, so if I unplug fpc, it's ok for testing?
On Thu, Jun 27, 2024, 5:34 PM David Roy @.***> wrote:
I believe manually shutting down an MPC is not considered an error. If you want, I could provide you a command to simulate an HW error in your lab.
— Reply to this email directly, view it on GitHub https://github.com/door7302/openjts/issues/39#issuecomment-2194345367, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBEHHTNMCYEBHXUCCMJUYDZJPTD7AVCNFSM6AAAAABJ27ICDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJUGM2DKMZWG4 . You are receiving this because you authored the thread.Message ID: @.***>
FOR LAB ONLY
1/ start shell pfe network fpcX.0 <<< X = slot number
2/ show cmerror module <<<< Identify the module ID for “Storage device” - in my case this is 5
3/ show cmerror module 5
Error-id PFE Level Threshold Count Occured Cleared Last-occurred(ms ago) Name 0x2c0002 0 Major 1 0 0 0 0 CPU_CMERROR_STORAGE_MSATA_DISABLED 0x2c0001 0 Minor 1 0 0 0 0 CPU_CMERROR_STORAGE_SMARTD_ERROR 0x2c0003 0 Minor 1 0 0 0 0 CPU_CMERROR_STORAGE_ACCESS_ERROR
Pick up the hexa ERROR-ID of a MAJOR error and its description and simulate the Error:
4/ test cmerror trigger-error 0x2c0002 0 CPU_CMERROR_STORAGE_MSATA_DISABLED 5
5/ exit
Now you should see a MAJOR ALARM
6/ regress@rtme-mx-25> show chassis alarms 3 alarms currently active Alarm time Class Description 2024-06-28 06:30:54 PDT Major FPC 2 Major Errors
On openJTS you should see:
To clear the alarm you need to reboot
Sorry for the late reply due to missing your email.
I will test and update you soon.
On Mon, Jul 8, 2024, 11:46 PM David Roy @.***> wrote:
Reopened #39 https://github.com/door7302/openjts/issues/39.
— Reply to this email directly, view it on GitHub https://github.com/door7302/openjts/issues/39#event-13426890668, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBEHHWETHI4VT6XKV3CP6LZLK67LAVCNFSM6AAAAABJ27ICDKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGQZDMOBZGA3DMOA . You are receiving this because you authored the thread.Message ID: @.***>
Any updates?
The result is as expected.
@.**_AGG02_TEST_SRT_ZTE> show chassis alarms 3 alarms currently activeAlarm time Class Description2024-08-08 09:39:27 +07 Major FPC 2 Major Errors2024-08-07 13:57:16 +07 Minor CB 0 Removed2024-08-05 14:19:29 +07 Minor Backup RE Active
But may I know which mechanism that Grafana can show this error ? Still streaming via gRPC? Because I see the notification does not appear immediately, it still need a time to refresh. Thanks Regard Nguyen Duc Hoa (Mr)
Vào Th 5, 8 thg 8, 2024 vào lúc 00:01 David Roy @.***> đã viết:
Reopened #39 https://github.com/door7302/openjts/issues/39.
— Reply to this email directly, view it on GitHub https://github.com/door7302/openjts/issues/39#event-13795378186, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBEHHUPJYXTSVHJNW3XPL3ZQJHGJAVCNFSM6AAAAABJ27ICDKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTG44TKMZXHAYTQNQ . You are receiving this because you authored the thread.Message ID: @.***>
Hi.
I install OpenJST to monitor MX960 with profile Heal Monitoring Profile. I make a test by shutting down FPC on MX960 (using the command request fpc slot offline). But I cannot see any alarm or warning on Grafara Web Gui (up to 4-5 minutes). Is there any way to detect fast the hardware error with OpenJTS?