borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.22k stars 743 forks source link

more specific return code / rc / error codes of the borg process #6756

Closed ThomasWaldmann closed 8 months ago

ThomasWaldmann commented 2 years ago

Currently, borg has only a few, rather generic return codes: 0 ok, 1 warning, 2 error, >=128 for termination due to signals.

This could be improved (preferably with a breaking release) by moving some stuff to more specific codes. Everything that has no specific code assigned stays at the current code.

First task is to go through the issue tracker and link to all issues talking about (more specific) error codes from here.

Then it needs some planning: what makes sense, what does not, ...

When planning is finished and there is some consensus, implementation can follow.

ThomasWaldmann commented 2 years ago

I'll update this list here with whatever we come up with:

# generic return codes used by borg
0 OK
1 WARNING
2 ERROR
# 3 .. 63 reserved for more specific errors
3 repository is locked
4 connection error  # e.g. ssh returned 255 (other return codes == ?)
5 quota limit reached / no space left in file system
...
# 64 .. 127 reserved for more specific warnings
...
128+N killed by signal N (N == 0..64 for posix signals / realtime signals?)
# 200 .. 255 reserved for informative RCs (like "0 OK", but more specifically telling something)

If a wrapper is unaware of a specific return code, it must treat it like 2 ERROR and notify the user.

An updated wrapper at least needs to implement a range check for the new error / warning / info ranges.

ThomasWaldmann commented 2 years ago

I'll update this list with other related issues we find:

fantasya-pbem commented 2 years ago

I'd like to suggest an enhancement of error codes.

We should at least reserve or define a broader range of codes that could be implemented in future versions, for example:

1-63: warning (1 meaning "general warning" as it is now) 64-127: error (64 meaning "general error")

(128+N stays unchanged)

If we don't want to implement more detailed return values yet, at least we would change rc 2 to 64. Users would have to change their wrappers from rc 2 to rc >= 64 to recognize errors and could keep the logic for warning.

If we want, we could already define some sub-ranges, e.g. "network errors", "disk related", "quota related" and the like.

ThomasWaldmann commented 2 years ago

No, rc 0 1 2 need to stay as is. There are enough free codes without changing these.

@fantasya-pbem I've updated https://github.com/borgbackup/borg/issues/6756#issuecomment-1147245734 .

ThomasWaldmann commented 2 years ago

hmm, if we only reserve space for warnings/errors, we only have 0 for success / information. not sure if that is a problem / limitation in practice.

fantasya-pbem commented 2 years ago

We could reserve the range from 128+64 onwards, maybe 200+ for possible success codes, if we ever have the need for this. Posix signals / real-time signals have max code 32 / 64.

fantasya-pbem commented 2 years ago

This proposal collects possible error/warning cases that can occur. Some may not be detectable or make no sense with the current implementation.

Specific errors

3 unknown cli option 4 pattern syntax error 5 repository not found 6 archive not found 7 locked repo (taken from #5158) 10 borg init unsupported encryption mode or key algorithm 11+ more init or repo related 15 borg check issues could not be fixed 16+ more check related 20 borg create failed, checkpoint archive left 21+ more create related 30 repository quota reached 31 backup disk full 32 decryption failure 33 decompression failure 34 checksum failure (if applicable) 35 missing chunk 36 segment not found 37 hash not found in index 38 unsupported key 39+ more disk or data structure related 50 remote error due to network error like "broken pipe" (taken from #1424) 51+ more specific SSH errors, needs investigation (taken from #3939)

Specific warnings

64 cannot combine cli options 65 lock removed 66+ more lock related 70 borg check --max-duration stopped 71 borg check --repair fixed issues (taken from #3446) 72 repository check found issues (only useful if repo check issue prevents archive check) 73 archive check found issues (only useful if archive check issue prevents verify) 74 verify check found issues 75+ more check related 80 cache rebuilt occurred / cache out of sync 81+ more cache related 85 borg create could not find root path (inspired by #6708) 86 borg create could not read files 87 file changed during borg create (inspired by #6657) 88+ more create related

Comments welcome!

ThomasWaldmann commented 2 years ago

For errors (fatal stuff that leads to immediate termination) we can deal rather easily with them: raise an Exception, let the exit handler deal with the exception.

For warnings (stuff borg notices while processing, but does not immediately terminate), we could encounter multiple different warnings. How to deal with that? Collect them in a list? If len(warnings) > 1, just give rc 1, otherwise the more specific code? Same for informations.

m3nu commented 2 years ago

No, rc 0 1 2 need to stay as is.

Thanks for the heads-up! We mostly use 0, 1 and 2 at the moment, but the new ones are for sure interesting to provide better feedback in the UI. Tracking adjustments here: https://github.com/borgbase/vorta/issues/1349

sophie-h commented 2 years ago

At first glance, this seems redundant to message ids. Pika is already making use of them. Therefore I don't see the use case yet.

(I would like to add msg ids to a bunch of existing warnings though :laughing:)

fantasya-pbem commented 2 years ago

Use case is for the shell people that don't want to parse JSON. Thank you for the message ids link, I wasn't aware of that yet. We could associate some of those IDs to return codes.

fantasya-pbem commented 2 years ago

New errors and warnings tables, return codes and message IDs associated (best guess).

Code Message ID Error
3+ CLI/patterns
Unknown CLI option
Pattern syntax error
8+ Repository
Repository.AlreadyExists Repository {} already exists.
Repository.CheckNeeded Inconsistency detected. Please run “borg check {}”.
Repository.DoesNotExist Repository {} does not exist.
Repository.InsufficientFreeSpaceError Insufficient free space to complete transaction (required: {}, available: {}).
Repository.InvalidRepository {} is not a valid repository. Check repo config.
Repository.AtticRepository Attic repository detected. Please run “borg upgrade {}”.
Repository.ObjectNotFound Object with key {} not found in repository {}.
Repository quota reached
Backup disk full
Missing chunk
Segment not found
Hash not found in index
20+ Archive
Archive.AlreadyExists Archive {} already exists
Archive.DoesNotExist Archive {} does not exist
Archive.IncompatibleFilesystemEncodingError Failed to encode filename “{}” into file system encoding “{}”. Consider configuring the LANG environment variable.
borg create failed, checkpoint archive left
25+ Key/Security
KeyfileInvalidError Invalid key file for repository {} found in {}.
KeyfileMismatchError Mismatch between repository {} and key file {}.
KeyfileNotFoundError No key file for repository {} found in {}.
PassphraseWrong passphrase supplied in BORG_PASSPHRASE is incorrect
PasswordRetriesExceeded exceeded the maximum password retries
RepoKeyNotFoundError No key entry found in the config of repository {}.
UnsupportedManifestError Unsupported manifest envelope. A newer version is required to access this repository.
UnsupportedPayloadError Unsupported payload type {}. A newer version is required to access this repository.
NotABorgKeyFile This file is not a borg key backup, aborting.
RepoIdMismatch This key backup seems to be for a different backup repository, aborting.
UnencryptedRepo Keymanagement not available for unencrypted repositories.
UnknownKeyType Keytype {0} is unknown.
40+ Locking
LockError Failed to acquire the lock {}. (solves #5158)
LockErrorT Failed to acquire the lock {}. (solves #5158)
45+ Caching
Cache.CacheInitAbortedError Cache initialization aborted
Cache.EncryptionMethodMismatch Repository encryption method changed since last access, refusing to continue
Cache.RepositoryAccessAborted Repository access aborted
Cache.RepositoryIDNotUnique Cache is newer than repository - do you have multiple, independently updated repos with same ID?
Cache.RepositoryReplay Cache is newer than repository - this is either an attack or unsafe (multiple repos with same ID)
50+ Check
borg check issue could not be fixed
55+ Misc
Buffer.MemoryLimitExceeded Requested buffer size {} is above the limit of {}.
ExtensionModuleError The Borg binary extension modules do not seem to be properly installed
IntegrityError Data integrity error: {}
NoManifestError Repository has no manifest.
PlaceholderError Formatting Error: “{}”.format({}): {}({})
Decryption failure
Decompression failure
64+ Network/Remote
ConnectionClosed Connection closed by remote host (solves #1424)
InvalidRPCMethod RPC method {} is not valid
PathNotAllowed Repository path not allowed
RemoteRepository.RPCServerOutdated Borg server is too old for {}. Required version {}
UnexpectedRPCDataFormatFromClient Borg {}: Got unexpected RPC data format from client.
UnexpectedRPCDataFormatFromServer Got unexpected RPC data format from server: {}
80+ More specific SSH errors, needs investigation (taken from #3939)
Code Message ID Warning
100+ CLI/patterns
Cannot combine cli options
103+ Repository
Repository.ObjectNotFound Object with key {} not found in repository {}.
106+ Locking
Lock removed
110+ Caching
Cache rebuilt occurred / cache out of sync
115+ Check
borg check --max-duration stopped
borg check --repair fixed issues (taken from #3446)
Repository check found issues (only useful if repo check issue prevents archive check)
Archive check found issues (only useful if archive check issue prevents verify)
Verify check found issues
borg create could not find root path (inspired by #6708)
borg create could not read files
File changed during borg create (inspired by #6657)
ThomasWaldmann commented 2 years ago

Is there some systematic in the Code (guess it means the return code, rc) column (except that >= 64 is a warning)?

Where do the numbers seen there come from?

fantasya-pbem commented 2 years ago

Not yet. Now, after having integrated the message IDs, the codes have to be distributed more evenly. But first I want to check these IDs - are they all really errors that abort Borg or are some of them progress messages? Second, after we have found that the suggested groups make sense and we have defined all possible (and recognizable) error / warning conditions, we can distribute the return codes appropriately.

fantasya-pbem commented 2 years ago

I found that most message IDs are just raised errors, but at least Repository.ObjectNotFound is used in a warning context, too (like in archive.delete() where this error is catched and just ignored). After reordering the groups there seem to exist much more errors than warnings, so I suggest to have the warnings starting with return code 100 (or at least 80) instead of 64.

sophie-h commented 2 years ago

It's not unusual to have different warnings like "File changed during borg create" or "File not found" during one create. I'm not sure how return codes would handle that.

Overall, I'm a bit scared of the shell script that will handle all of these errors. Since the number of exit codes is quite limited I don't really see that this concept scales. I would suggest writing down the use case more specifically to see how this could actually work.

ThomasWaldmann commented 2 years ago

@fantasya-pbem are you still working on this?

fantasya-pbem commented 2 years ago

No, and nobody besides me seems to have interest, too. My "work" has been collecting possible error codes,the real work had to be done by developers of Borg. Feel free to close this issue.

ThomasWaldmann commented 2 years ago

@fantasya-pbem Well, my impression was that there definitely is some interest (see also misc. other related tickets) and I was primarily waiting for you to finish your work, see https://github.com/borgbackup/borg/issues/6756#issuecomment-1159068059 .

I guess I or someone else could then implement this, but having a good plan first is essential. In case we get some funds on the bountysource org account, there could be even one or two bounties.

Personally I will be rather busy with #6987 for a while, so help here is appreciated!

fantasya-pbem commented 2 years ago

See https://github.com/borgbackup/borg/issues/6756#issuecomment-1159689734. Also: It seems that warning return codes are less useful as there is likely more than one condition to be represented. Error codes make more sense here I guess, maybe we could concentrate on errors first. Then, the next task would be to check the code where these errors can occur: Is there an exception thrown / is Borg aborted? If so, the respective error code could be returned directly.

ThomasWaldmann commented 2 years ago

Yes, we can just leave warnings away for now to simplify the task.

So, only error return codes for conditions that lead to a rather immediate termination. Usually this means either some exception that gets handled by the toplevel handler in borg.archiver or a command handler returning with some error code.

ThomasWaldmann commented 1 year ago

Just wanted to ask if somebody wants to either work on this or put a bounty on it?

Due to compatibility reasons I guess the best point to introduce such a change would be with borg 2.0 (which is a breaking change anyway).

But for that to happen, work on this would have to be done rather soon or it won't be in 2.0.

ThomasWaldmann commented 1 year ago

I'ld recommend doing this in this order:

ThomasWaldmann commented 1 year ago

ERRORs are rather simple, WARNINGs more difficult / unclear yet how to best deal with.

real-yfprojects commented 1 year ago

ERRORs are rather simple

I think you have to walk through the relevant parts of the code base and specify the return code for a the different problems resulting in an error. Or is a their a way to easily map existing exceptions and message ids to return codes in one place?

ThomasWaldmann commented 1 year ago

@real-yfprojects yes. it's not necessarily an exception, can be also just an if+ return.

mai-schlau commented 1 year ago

Are there plans to have a own RC for the borg create message "file changed while we backed it up", which comes now up with RC1?

ThomasWaldmann commented 1 year ago

@mai-schlau that is a warning (not: error) message and considering there might be multiple and different warnings (plus potentially even some real error), this is a more difficult case than the "real error" return codes and IIRC we did not discuss yet how to handle that.

mai-schlau commented 1 year ago

As we have multiple systems, which produce this "file changed while we backed it up" and which still have to be backuped it would be very helpful if these messages would produce a different RC than 1. So it would be easier and more handy to distinguish this occurrence from others like 'no such path' which in our environment requests action.

But maybe, and this is only a simple suggestion, something like RC 1-1 for the "changed while" and a simple 1 for the rest would be an approach. Further RC1s like 'no such path' could also come up with RC 1-2 and maybe the list could be filled till 1-9.

Especially the RC1-1 for the "file changed while we backed it up", would make the messages and our monitoring a lot clearer ...

ThomasWaldmann commented 1 year ago

idea about how to deal with warnings

As there can be multiple warnings in a borg run, it is difficult to map all this information to a few bits in a 8bit return code.

But what we could do is:

We could think about whether it makes sense to have the same rc for different kinds of similar warnings, so that the specific warning rc case in that computation is used more frequently than the fallback to rc 1 ("find it out yourself!")

sophie-h commented 1 year ago

Couldn't see this being mentioned before: If any exit codes are added, I would strongly advice to not use exit codes that might appear independent of borg. Especially 128+n for signals like SEGFAULT.

ThomasWaldmann commented 1 year ago

@sophie-h sure, only 3..127 will be used (and 0, 1, 2 when nothing more specific is known).

ThomasWaldmann commented 1 year ago

PR #7928 is not quite finished, but guess some early review / feedback would be helpful!

I did a PR against 1.2-maint, but guess that needs some good review and testing before we can decide whether it can go into 1.2.x releases - the change got a bit bigger than I thought...

Update:

I suspect that this has a slight potential to either break existing scripts or have some new bugs in the way errors and warnings are dealt with internally in borg - so maybe this is not suitable for a 1.2.x patch release which is expected to be rather stable and boring.

1.3.0* has some tags in the git repo, but that was only an early attempt of what later became borg 2.0.0*, so guess we better ignore/skip that version number to not confuse users or developers.

Thus, borg 1.4.0 could be the next near-future and rather stable release (borg2 will still take quite a while) and we could have some alpha / beta / rc phases first before releasing that. The increment in the minor number might also motivate more people to look into the changelog and maybe adapt their scripts for the new return codes. We could even use the new return codes by default, not sure.

What do you think about that?

ThomasWaldmann commented 12 months ago

Related: #7080 (msgid for file changed warning)

ThomasWaldmann commented 10 months ago

Guess everybody was too busy to comment, so I just went ahead and put this into first borg 1.4.0 milestone.

ThomasWaldmann commented 10 months ago

borg 1.4 fixed by PR #7976 - this definitely will need practical testing!

ThomasWaldmann commented 8 months ago

borg2 / master branch now got all the changes forward-ported, too.