Open drsound opened 5 years ago
Changing the exit code at this time is not a good idea. It would break some existing users' scripts.
I think you can just grep the output to make sure that no files are skipped. These specific log messages are unlikely to be changed in the future.
What about adding an argument that lets us change the exit code behavior? If the argument isn't specified, the behavior stays the same, but if we specify an argument like -detailed-exit-code
we could have a more detailed exit code mapping that can tell us if something failed to upload or similar? Grepping the output is really a bad solution here.
@gilbertchen is afraid of breaking existing backup scripts. I try to analyze the possible cases to show there would be no breaking.
Every kind of backup script I can think of, can do 4 things with the exit code:
If we add an additional exit code for "not included files" the script behaviour in the previous 4 cases would change as follows:
In case there aren't "not included files" (most common situation):
In case there are "not included files":
Now, in my opinion the higher priority should be not to give false sense of security (case # 2).
Anyway, if @gilbertchen does not agree with directly adding a new exit state, I think @Thalagyrt solution is the second best we could have.
Good analysis. I'm definitely in agreement with @drsound here. I don't see any way in which adding a status code for incomplete/failed backups would break existing scripts in a bad way.
If someone's ignoring the exit code, there really is no change. If someone's using set -e
or otherwise checking exit codes, then they're going to get an immediate improvement in that their script will know about the failure from the exit code immediately.
Further, if someone's checking exit codes, they pretty much are guaranteed to be knowledgeable enough to have the script send an alert on non-zero exit codes or otherwise just abort on failure and call into something like dead man's snitch on success, and the script aborting on the non-zero would cause DMS or similar to alert since the final check-in would short circuit out.
Agreed, a return code of zero with a less than successful backup is not good. In my opinion, a non-zero exit code should be used if a file included (explicitly or implicitly) in the backup is not successfully backed up
So I'm running running duplicacy web and it shows all my backups as succeeding (they were green). But when I clicked on the backup status I saw a bunch of permission error warnings, so half my data wasn't actually backed up. This is a very scary situation to be in. The exit code needs to indicate these failures.
pylint uses a bit mask as the exit code. That might be useful so the log files don't need to be parsed to determine what type of errors happened.
Hey @gilbertchen, I'd just like to draw attention to this issue again - it's creating needless complexity and resource waste in scripts due to having to grep for error messages instead of read a simple exit code, and using exit codes would fall in line with how every other POSIX based tool behaves. Always exiting 0 even in failure is a pretty clear and cut violation of the principle of least astonishment and most people who are experienced with any POSIX compliant system are going to assume 0 means all's good, leading to the various situations mentioned above where backups are silently failing and people have no idea. I really see no way in which this would break people's scripts for the worse. I'd love to discuss a bit more in depth if you're up for it at some point.
If the concern is that changing error codes is going to break people's scripts, I like @Thalagyrt's idea of adding something like --posix-exit-codes
to trigger the behavior of exiting nonzero when it runs into issues during the backup.
It's scary to think that backups could be failing for a long time with no easy way to detect the failure in my backup wrapper scripts.
@gilbertchen I would also like to draw attention to the issue of returning exit 0 on unsuccessful backups. I am using the web version and if it states, that the backup was successful, it should be successful. A backup solution is only good when one can trust it. So even if you stand by your point in not wanting to possibly break user scripts, the UI's "completed successfully" is wrong
Hi @gilbertchen - Is this still accurate in 2021? If that's the case, I'd also like to draw attention to this...
Any news on this issue?
Changing the exit code at this time is not a good idea. It would break some existing users' scripts.
I find it hard to follow this justification. If scripts are written properly they already handle errors on a non-zero exit status, including an unknown error. Having the error properly reported will follow a proper error-handling path in the user's scripts.
If the scripts are already written to grep through logs and disregard the return code, then their behavior is not expected to change.
The only scenario when a user script's behavior will be materially different is if, after detecting a no-zero RC, it is no longer grepping and so will report a generic error and not a specific one from grep - an inconvenience to a user, but not a data loss.
Still, changing the return code meaning is a breaking change, even if to address this design bug. I think this is a big enough bug to warrant a major release version bump with proper announcements of the backward incompatibility.
I ended up switching to restic because of this. I can't trust the backups and am uninterested in parsing logs to find errors - I'll never be confident that I've addressed every possible error case. It's a shame, I really like duplicacy other than this one very serious issue.
Add a --posix-exit-codes
option to trigger exiting non-zero for editing if you're worried about breaking existing wrapper scripts so that people don't have to write a bunch of brittle log parsing code.
FWIW this is the post-backup
script I wrote :/
#!/bin/bash -ex
# exec 1>/tmp/dup-backup.log
# exec 2>&1
cd "$HOME/.duplicacy-web/logs"
BACKUP_FILE="$(find -iname 'backup-*' -ctime -1 | sort | tail -n 1)"
if [[ ! -e "$BACKUP_FILE" ]]; then
curl --retry 3 -X POST -d 'No backup file found' \
https://hc-ping.com/XXXXXXX/fail
exit 1
fi
if grep -q ' WARN ' "${BACKUP_FILE}"; then
curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX/fail
exit 1
elif grep -q ' BACKUP_SKIPPED ' "${BACKUP_FILE}"; then
curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX/fail
exit 1
elif grep -q ' BACKUP_END ' "${BACKUP_FILE}"; then
curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX
else
curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX/fail
exit 1
fi
This issue has been mentioned on Duplicacy Forum. There might be relevant details there:
https://forum.duplicacy.com/t/warnings-during-backup-not-surfaced-to-web-ui/8785/1
As it is implemented right now, the 0 exit code is a little misleading: as it happens for every other program I know, a 0 exit code should be returned if absolutely no problem occurred, not even a warning, so that, for example, you can use it in a script to send an email saying "backup successful".
I just executed a backup, a file was not backed up, but I still got a 0 exit code!
Some questions I asked myself:
I think exit codes were invented to avoid this kind of problems.
My proposal: why not to add another exit code in case of any warning or problem during the backup?
Here is my backup output that gave me 0 exit code, look at the last line: