The 0 exit code is a little misleading

drsound commented 5 years ago

As it is implemented right now, the 0 exit code is a little misleading: as it happens for every other program I know, a 0 exit code should be returned if absolutely no problem occurred, not even a warning, so that, for example, you can use it in a script to send an email saying "backup successful".

I just executed a backup, a file was not backed up, but I still got a 0 exit code!

Some questions I asked myself:

What if it was an important file?
What if 99% of the files for some reasons were not backed up? Would I still get a 0 exit code?
Should I always "grep" the backup output looking for signs of trouble?
What if I grep for some specific text but in some future release the warning message changes? I will think my backups are fine but I could miss some files!

I think exit codes were invented to avoid this kind of problems.

My proposal: why not to add another exit code in case of any warning or problem during the backup?

Here is my backup output that gave me 0 exit code, look at the last line:

Backup for / at revision 3 completed
Files: 1278499 total, 573,580M bytes; 183 new, 70,170K bytes
File chunks: 116957 total, 573,807M bytes; 11 new, 62,841K bytes, 19,645K bytes uploaded
Metadata chunks: 92 total, 482,955K bytes; 16 new, 99,754K bytes, 28,975K bytes uploaded
All chunks: 117049 total, 574,278M bytes; 27 new, 162,596K bytes, 48,620K bytes uploaded
Total running time: 00:02:30
1 file was not included due to access errors

gilbertchen commented 5 years ago

Changing the exit code at this time is not a good idea. It would break some existing users' scripts.

I think you can just grep the output to make sure that no files are skipped. These specific log messages are unlikely to be changed in the future.

Thalagyrt commented 5 years ago

What about adding an argument that lets us change the exit code behavior? If the argument isn't specified, the behavior stays the same, but if we specify an argument like -detailed-exit-code we could have a more detailed exit code mapping that can tell us if something failed to upload or similar? Grepping the output is really a bad solution here.

drsound commented 5 years ago

@gilbertchen is afraid of breaking existing backup scripts. I try to analyze the possible cases to show there would be no breaking.

Every kind of backup script I can think of, can do 4 things with the exit code:

Ignore it (Very bad thing!)
Tell the user "everything is fine" if it's 0 (Bad thing! What if you have a lot of missing files?)
Tell the user "everything is fine" if it's 0 AND grepping the output you haven't any "not included files" (Ok, but it leads to ugly and inefficient scripts. Moreover, where in the duplicacy documentation is written you should do this grepping to be sure you have a correct backup? I would bet many people don't do it!)
Tell the user "something went wrong" if it's not 0, with optional variants depending on the exact exit code value

If we add an additional exit code for "not included files" the script behaviour in the previous 4 cases would change as follows:

In case there aren't "not included files" (most common situation):

No change
No change
No change
No change

In case there are "not included files":

No change
Script behaviour would be better, because it will not give the user a false sense of security, a wrong "everything is fine" message
The exit code will be 0 no more, so the grepping will not be done, but every script checking for a 0 exit code, for sure also checks for a non-zero one, so the error will be shown by next case # 4
No change in case the script doen't check the exact non-zero exit code. In case the script checks the exact exit code to show different error messages, the new exit code will not be recognized, so a generic error will be shown.

Now, in my opinion the higher priority should be not to give false sense of security (case # 2).

Anyway, if @gilbertchen does not agree with directly adding a new exit state, I think @Thalagyrt solution is the second best we could have.

Thalagyrt commented 5 years ago

Good analysis. I'm definitely in agreement with @drsound here. I don't see any way in which adding a status code for incomplete/failed backups would break existing scripts in a bad way.

If someone's ignoring the exit code, there really is no change. If someone's using set -e or otherwise checking exit codes, then they're going to get an immediate improvement in that their script will know about the failure from the exit code immediately.

Further, if someone's checking exit codes, they pretty much are guaranteed to be knowledgeable enough to have the script send an alert on non-zero exit codes or otherwise just abort on failure and call into something like dead man's snitch on success, and the script aborting on the non-zero would cause DMS or similar to alert since the final check-in would short circuit out.

leftytennis commented 5 years ago

Agreed, a return code of zero with a less than successful backup is not good. In my opinion, a non-zero exit code should be used if a file included (explicitly or implicitly) in the backup is not successfully backed up

ismell commented 5 years ago

So I'm running running duplicacy web and it shows all my backups as succeeding (they were green). But when I clicked on the backup status I saw a bunch of permission error warnings, so half my data wasn't actually backed up. This is a very scary situation to be in. The exit code needs to indicate these failures.

pylint uses a bit mask as the exit code. That might be useful so the log files don't need to be parsed to determine what type of errors happened.

Thalagyrt commented 4 years ago

Hey @gilbertchen, I'd just like to draw attention to this issue again - it's creating needless complexity and resource waste in scripts due to having to grep for error messages instead of read a simple exit code, and using exit codes would fall in line with how every other POSIX based tool behaves. Always exiting 0 even in failure is a pretty clear and cut violation of the principle of least astonishment and most people who are experienced with any POSIX compliant system are going to assume 0 means all's good, leading to the various situations mentioned above where backups are silently failing and people have no idea. I really see no way in which this would break people's scripts for the worse. I'd love to discuss a bit more in depth if you're up for it at some point.

unixorn commented 4 years ago

If the concern is that changing error codes is going to break people's scripts, I like @Thalagyrt's idea of adding something like --posix-exit-codes to trigger the behavior of exiting nonzero when it runs into issues during the backup.

It's scary to think that backups could be failing for a long time with no easy way to detect the failure in my backup wrapper scripts.

ysGrob commented 3 years ago

@gilbertchen I would also like to draw attention to the issue of returning exit 0 on unsuccessful backups. I am using the web version and if it states, that the backup was successful, it should be successful. A backup solution is only good when one can trust it. So even if you stand by your point in not wanting to possibly break user scripts, the UI's "completed successfully" is wrong

xxxliqu1dxxx commented 3 years ago

Hi @gilbertchen - Is this still accurate in 2021? If that's the case, I'd also like to draw attention to this...

Dex321x commented 2 years ago

Any news on this issue?

didenko commented 1 year ago

Changing the exit code at this time is not a good idea. It would break some existing users' scripts.

I find it hard to follow this justification. If scripts are written properly they already handle errors on a non-zero exit status, including an unknown error. Having the error properly reported will follow a proper error-handling path in the user's scripts.

If the scripts are already written to grep through logs and disregard the return code, then their behavior is not expected to change.

The only scenario when a user script's behavior will be materially different is if, after detecting a no-zero RC, it is no longer grepping and so will report a generic error and not a specific one from grep - an inconvenience to a user, but not a data loss.

Still, changing the return code meaning is a breaking change, even if to address this design bug. I think this is a big enough bug to warrant a major release version bump with proper announcements of the backward incompatibility.

unixorn commented 1 year ago

I ended up switching to restic because of this. I can't trust the backups and am uninterested in parsing logs to find errors - I'll never be confident that I've addressed every possible error case. It's a shame, I really like duplicacy other than this one very serious issue.

Add a --posix-exit-codes option to trigger exiting non-zero for editing if you're worried about breaking existing wrapper scripts so that people don't have to write a bunch of brittle log parsing code.

ismell commented 1 year ago

FWIW this is the post-backup script I wrote :/

#!/bin/bash -ex

# exec 1>/tmp/dup-backup.log
# exec 2>&1

cd "$HOME/.duplicacy-web/logs"
BACKUP_FILE="$(find -iname 'backup-*' -ctime -1 | sort | tail -n 1)"

if [[ ! -e "$BACKUP_FILE" ]]; then
        curl --retry 3 -X POST -d 'No backup file found' \
                 https://hc-ping.com/XXXXXXX/fail
        exit 1
fi

if grep -q ' WARN ' "${BACKUP_FILE}"; then
        curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX/fail
        exit 1
elif grep -q ' BACKUP_SKIPPED ' "${BACKUP_FILE}"; then
        curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX/fail
        exit 1
elif grep -q ' BACKUP_END ' "${BACKUP_FILE}"; then
        curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX
else
        curl --retry 3 -X POST --data-binary "@${BACKUP_FILE}" https://hc-ping.com/XXXXXXX/fail
        exit 1
fi

gilbertchen commented 4 months ago

This issue has been mentioned on Duplicacy Forum. There might be relevant details there:

https://forum.duplicacy.com/t/warnings-during-backup-not-surfaced-to-web-ui/8785/1

gilbertchen / duplicacy

The 0 exit code is a little misleading #540