M-Welsch / backup-server

Backup Server (BaSe)
Apache License 2.0
3 stars 1 forks source link

get rid of rsync errors `/media/Backup... no such file or directory` #19

Closed M-Welsch closed 8 months ago

M-Welsch commented 8 months ago

Description

During #15 we saw issues that probably relate to the mounting process

Jan 07 21:10:04 raspberrypi python3[707]: mount: /media/BackupHDD: must be superuser to use mount.
...
Jan 07 21:10:04 raspberrypi python3[318]: 2024-01-07 21:10:04,460 INFO: __main__: Fehlerausgabe: rsync: [Receiver] mkdir "/media/BackupHDD/backups/current" failed: No such file or directory (2)
Jan 07 21:10:04 raspberrypi python3[318]: rsync error: error in file IO (code 11) at main.c(787) [Receiver=3.2.3]

The steps the program does usually work. In case they don't, they work when entered manually to the terminal. Maybe a timing problem

What happens if we don't do it (aka Why is it important)?

a backup will sometimes fail. About 1/4 of the times

Definition of Ready

Key Tasks

Acceptance Criteria

M-Welsch commented 8 months ago

see this logfile for more info: logfile.log

Debugging

exit the program when this error occurs, so the system should be in the state where the error has occured.

then login via ssh and see ...

I guess we need either

  1. a verification that docking has happened
  2. or a more robust docking command

1 seems easier for now. This verification isn't bad anyway

M-Welsch commented 8 months ago

verification

async def engage() -> None:
    LOG.debug("Docking...")
    docking_trials = 0
    while not await pcu.cmd.dock():
        LOG.warning("couldn't dock, try another time.")
        docking_trials += 1
        if docking_trials == 2:
            raise RuntimeError("couldn't dock with two trials")
    ...

subsequent 9 runs show 5 of these warnings, but run through otherwise, see issue19_logfile_for_verification.log

M-Welsch commented 8 months ago

longer test that updates the testfiles on nas on startup of the backupserver

[Unit]
Description=Generate fresh testfiles on NAS for Backup Server

[Service]
User=base
Group=base
Type=simple
ExecStart=/usr/bin/bash /home/base/backup-server/software/utils/generate_testdata_on_nas.sh

[Install]
WantedBy=multi-user.target

should run over night

M-Welsch commented 8 months ago

Suggestion: implement handshake when PCU communication is established. This way we can make sure that this "ominous first command fail" happens controlled during the init phase and not during operation at some random point of time

edit: created #20

M-Welsch commented 8 months ago

29 runs, the warning couldn't dock, try another time appeared 22 times, then the program stopped because it needed 2 trials, which causes an exception. Increased the maximum to 4

55 runs, the warning couldn't dock, try another time appeared 22 times. Interrupted intentionally, system would have continued

implemented pcu handshake: #20

6 runs, the warning couldn't dock, try another time didn't appear anymore, but system crashed due to #21

implemented fix for #21 (change in the c-code)

658 runs, the warning appeared around 40 times. They clustered at the beginning and the end. The system would have run on, but I interrupted to check the logs