Closed M-Welsch closed 8 months ago
see this logfile for more info: logfile.log
exit the program when this error occurs, so the system should be in the state where the error has occured.
then login via ssh and see ...
mount
doesn't show the backuphddls /dev
doesn't show BACKUPHDD
-> devicenode not therecmd dock
docks flawlesslyI guess we need either
1 seems easier for now. This verification isn't bad anyway
verification
async def engage() -> None:
LOG.debug("Docking...")
docking_trials = 0
while not await pcu.cmd.dock():
LOG.warning("couldn't dock, try another time.")
docking_trials += 1
if docking_trials == 2:
raise RuntimeError("couldn't dock with two trials")
...
subsequent 9 runs show 5 of these warnings, but run through otherwise, see issue19_logfile_for_verification.log
longer test that updates the testfiles on nas on startup of the backupserver
[Unit]
Description=Generate fresh testfiles on NAS for Backup Server
[Service]
User=base
Group=base
Type=simple
ExecStart=/usr/bin/bash /home/base/backup-server/software/utils/generate_testdata_on_nas.sh
[Install]
WantedBy=multi-user.target
should run over night
Suggestion: implement handshake when PCU communication is established. This way we can make sure that this "ominous first command fail" happens controlled during the init phase and not during operation at some random point of time
edit: created #20
29 runs, the warning couldn't dock, try another time
appeared 22 times, then the program stopped because it needed 2 trials, which causes an exception. Increased the maximum to 4
55 runs, the warning couldn't dock, try another time
appeared 22 times. Interrupted intentionally, system would have continued
implemented pcu handshake: #20
6 runs, the warning couldn't dock, try another time
didn't appear anymore, but system crashed due to #21
implemented fix for #21 (change in the c-code)
658 runs, the warning appeared around 40 times. They clustered at the beginning and the end. The system would have run on, but I interrupted to check the logs
Description
During #15 we saw issues that probably relate to the mounting process
The steps the program does usually work. In case they don't, they work when entered manually to the terminal. Maybe a timing problem
What happens if we don't do it (aka Why is it important)?
a backup will sometimes fail. About 1/4 of the times
Definition of Ready
Key Tasks
Acceptance Criteria