Closed arlakshm closed 1 year ago
@arlakshm as discussed offline, mounting /tmp and /var as tmpfs will help minimize potential flash corruption due to power loss, etc.
In terms of logs/info to collect,
dmesg
: logs are stored in a ring buffer and thus should be queried ASAP after a failure is seensmartctl -a /dev/sda1
/var/log/syslog
journalctl
@kenneth-arista is root cause confirmed/known?
The trigger is not specific to CL2. But instead it is a known behavior of EXT4 when there is some file system corruption due to unclean unmounts (e.g. sudden power loss, etc.).
Looks like other platforms are moving /var/log to tmpfs to minimize writes to flash. See https://github.com/sonic-net/sonic-buildimage/pull/15077
The problem is understood and thus closing this issue. We'll be pushing some changes in the platform code that should help mitigate occurrences in sonic-mgmt testing.
The clearwater 2 linecards are going to read-only mode during sonic-mgmt. nightly test.
Message from dmsg