Closed ThomasWaldmann closed 8 months ago
I'll update this list here with whatever we come up with:
# generic return codes used by borg
0 OK
1 WARNING
2 ERROR
# 3 .. 63 reserved for more specific errors
3 repository is locked
4 connection error # e.g. ssh returned 255 (other return codes == ?)
5 quota limit reached / no space left in file system
...
# 64 .. 127 reserved for more specific warnings
...
128+N killed by signal N (N == 0..64 for posix signals / realtime signals?)
# 200 .. 255 reserved for informative RCs (like "0 OK", but more specifically telling something)
If a wrapper is unaware of a specific return code, it must treat it like 2 ERROR and notify the user.
An updated wrapper at least needs to implement a range check for the new error / warning / info ranges.
I'll update this list with other related issues we find:
I'd like to suggest an enhancement of error codes.
We should at least reserve or define a broader range of codes that could be implemented in future versions, for example:
1-63: warning (1 meaning "general warning" as it is now) 64-127: error (64 meaning "general error")
(128+N stays unchanged)
If we don't want to implement more detailed return values yet, at least we would change rc 2 to 64. Users would have to change their wrappers from rc 2 to rc >= 64 to recognize errors and could keep the logic for warning.
If we want, we could already define some sub-ranges, e.g. "network errors", "disk related", "quota related" and the like.
No, rc 0 1 2 need to stay as is. There are enough free codes without changing these.
@fantasya-pbem I've updated https://github.com/borgbackup/borg/issues/6756#issuecomment-1147245734 .
hmm, if we only reserve space for warnings/errors, we only have 0 for success / information. not sure if that is a problem / limitation in practice.
We could reserve the range from 128+64 onwards, maybe 200+ for possible success codes, if we ever have the need for this. Posix signals / real-time signals have max code 32 / 64.
This proposal collects possible error/warning cases that can occur. Some may not be detectable or make no sense with the current implementation.
Specific errors
3 unknown cli option
4 pattern syntax error
5 repository not found
6 archive not found
7 locked repo (taken from #5158)
10 borg init
unsupported encryption mode or key algorithm
11+ more init or repo related
15 borg check
issues could not be fixed
16+ more check related
20 borg create
failed, checkpoint archive left
21+ more create related
30 repository quota reached
31 backup disk full
32 decryption failure
33 decompression failure
34 checksum failure (if applicable)
35 missing chunk
36 segment not found
37 hash not found in index
38 unsupported key
39+ more disk or data structure related
50 remote error due to network error like "broken pipe" (taken from #1424)
51+ more specific SSH errors, needs investigation (taken from #3939)
Specific warnings
64 cannot combine cli options
65 lock removed
66+ more lock related
70 borg check --max-duration
stopped
71 borg check --repair
fixed issues (taken from #3446)
72 repository check found issues (only useful if repo check issue prevents archive check)
73 archive check found issues (only useful if archive check issue prevents verify)
74 verify check found issues
75+ more check related
80 cache rebuilt occurred / cache out of sync
81+ more cache related
85 borg create
could not find root path (inspired by #6708)
86 borg create
could not read files
87 file changed during borg create
(inspired by #6657)
88+ more create related
Comments welcome!
For errors (fatal stuff that leads to immediate termination) we can deal rather easily with them: raise an Exception, let the exit handler deal with the exception.
For warnings (stuff borg notices while processing, but does not immediately terminate), we could encounter multiple different warnings. How to deal with that? Collect them in a list? If len(warnings) > 1
, just give rc 1, otherwise the more specific code? Same for informations.
No, rc 0 1 2 need to stay as is.
Thanks for the heads-up! We mostly use 0, 1 and 2 at the moment, but the new ones are for sure interesting to provide better feedback in the UI. Tracking adjustments here: https://github.com/borgbase/vorta/issues/1349
At first glance, this seems redundant to message ids. Pika is already making use of them. Therefore I don't see the use case yet.
(I would like to add msg ids to a bunch of existing warnings though :laughing:)
Use case is for the shell people that don't want to parse JSON. Thank you for the message ids link, I wasn't aware of that yet. We could associate some of those IDs to return codes.
New errors and warnings tables, return codes and message IDs associated (best guess).
Code | Message ID | Error |
---|---|---|
3+ | CLI/patterns | |
Unknown CLI option | ||
Pattern syntax error | ||
8+ | Repository | |
Repository.AlreadyExists | Repository {} already exists. | |
Repository.CheckNeeded | Inconsistency detected. Please run “borg check {}”. | |
Repository.DoesNotExist | Repository {} does not exist. | |
Repository.InsufficientFreeSpaceError | Insufficient free space to complete transaction (required: {}, available: {}). | |
Repository.InvalidRepository | {} is not a valid repository. Check repo config. | |
Repository.AtticRepository | Attic repository detected. Please run “borg upgrade {}”. | |
Repository.ObjectNotFound | Object with key {} not found in repository {}. | |
Repository quota reached | ||
Backup disk full | ||
Missing chunk | ||
Segment not found | ||
Hash not found in index | ||
20+ | Archive | |
Archive.AlreadyExists | Archive {} already exists | |
Archive.DoesNotExist | Archive {} does not exist | |
Archive.IncompatibleFilesystemEncodingError | Failed to encode filename “{}” into file system encoding “{}”. Consider configuring the LANG environment variable. | |
borg create failed, checkpoint archive left |
||
25+ | Key/Security | |
KeyfileInvalidError | Invalid key file for repository {} found in {}. | |
KeyfileMismatchError | Mismatch between repository {} and key file {}. | |
KeyfileNotFoundError | No key file for repository {} found in {}. | |
PassphraseWrong | passphrase supplied in BORG_PASSPHRASE is incorrect | |
PasswordRetriesExceeded | exceeded the maximum password retries | |
RepoKeyNotFoundError | No key entry found in the config of repository {}. | |
UnsupportedManifestError | Unsupported manifest envelope. A newer version is required to access this repository. | |
UnsupportedPayloadError | Unsupported payload type {}. A newer version is required to access this repository. | |
NotABorgKeyFile | This file is not a borg key backup, aborting. | |
RepoIdMismatch | This key backup seems to be for a different backup repository, aborting. | |
UnencryptedRepo | Keymanagement not available for unencrypted repositories. | |
UnknownKeyType | Keytype {0} is unknown. | |
40+ | Locking | |
LockError | Failed to acquire the lock {}. (solves #5158) | |
LockErrorT | Failed to acquire the lock {}. (solves #5158) | |
45+ | Caching | |
Cache.CacheInitAbortedError | Cache initialization aborted | |
Cache.EncryptionMethodMismatch | Repository encryption method changed since last access, refusing to continue | |
Cache.RepositoryAccessAborted | Repository access aborted | |
Cache.RepositoryIDNotUnique | Cache is newer than repository - do you have multiple, independently updated repos with same ID? | |
Cache.RepositoryReplay | Cache is newer than repository - this is either an attack or unsafe (multiple repos with same ID) | |
50+ | Check | |
borg check issue could not be fixed |
||
55+ | Misc | |
Buffer.MemoryLimitExceeded | Requested buffer size {} is above the limit of {}. | |
ExtensionModuleError | The Borg binary extension modules do not seem to be properly installed | |
IntegrityError | Data integrity error: {} | |
NoManifestError | Repository has no manifest. | |
PlaceholderError | Formatting Error: “{}”.format({}): {}({}) | |
Decryption failure | ||
Decompression failure | ||
64+ | Network/Remote | |
ConnectionClosed | Connection closed by remote host (solves #1424) | |
InvalidRPCMethod | RPC method {} is not valid | |
PathNotAllowed | Repository path not allowed | |
RemoteRepository.RPCServerOutdated | Borg server is too old for {}. Required version {} | |
UnexpectedRPCDataFormatFromClient | Borg {}: Got unexpected RPC data format from client. | |
UnexpectedRPCDataFormatFromServer | Got unexpected RPC data format from server: {} | |
80+ | More specific SSH errors, needs investigation (taken from #3939) |
Code | Message ID | Warning |
---|---|---|
100+ | CLI/patterns | |
Cannot combine cli options | ||
103+ | Repository | |
Repository.ObjectNotFound | Object with key {} not found in repository {}. | |
106+ | Locking | |
Lock removed | ||
110+ | Caching | |
Cache rebuilt occurred / cache out of sync | ||
115+ | Check | |
borg check --max-duration stopped |
||
borg check --repair fixed issues (taken from #3446) |
||
Repository check found issues (only useful if repo check issue prevents archive check) | ||
Archive check found issues (only useful if archive check issue prevents verify) | ||
Verify check found issues | ||
borg create could not find root path (inspired by #6708) |
||
borg create could not read files |
||
File changed during borg create (inspired by #6657) |
Is there some systematic in the Code (guess it means the return code, rc) column (except that >= 64 is a warning)?
Where do the numbers seen there come from?
Not yet. Now, after having integrated the message IDs, the codes have to be distributed more evenly. But first I want to check these IDs - are they all really errors that abort Borg or are some of them progress messages? Second, after we have found that the suggested groups make sense and we have defined all possible (and recognizable) error / warning conditions, we can distribute the return codes appropriately.
I found that most message IDs are just raised errors, but at least Repository.ObjectNotFound
is used in a warning context, too (like in archive.delete()
where this error is catched and just ignored). After reordering the groups there seem to exist much more errors than warnings, so I suggest to have the warnings starting with return code 100 (or at least 80) instead of 64.
It's not unusual to have different warnings like "File changed during borg create" or "File not found" during one create
. I'm not sure how return codes would handle that.
Overall, I'm a bit scared of the shell script that will handle all of these errors. Since the number of exit codes is quite limited I don't really see that this concept scales. I would suggest writing down the use case more specifically to see how this could actually work.
@fantasya-pbem are you still working on this?
No, and nobody besides me seems to have interest, too. My "work" has been collecting possible error codes,the real work had to be done by developers of Borg. Feel free to close this issue.
@fantasya-pbem Well, my impression was that there definitely is some interest (see also misc. other related tickets) and I was primarily waiting for you to finish your work, see https://github.com/borgbackup/borg/issues/6756#issuecomment-1159068059 .
I guess I or someone else could then implement this, but having a good plan first is essential. In case we get some funds on the bountysource org account, there could be even one or two bounties.
Personally I will be rather busy with #6987 for a while, so help here is appreciated!
See https://github.com/borgbackup/borg/issues/6756#issuecomment-1159689734. Also: It seems that warning return codes are less useful as there is likely more than one condition to be represented. Error codes make more sense here I guess, maybe we could concentrate on errors first. Then, the next task would be to check the code where these errors can occur: Is there an exception thrown / is Borg aborted? If so, the respective error code could be returned directly.
Yes, we can just leave warnings away for now to simplify the task.
So, only error return codes for conditions that lead to a rather immediate termination.
Usually this means either some exception that gets handled by the toplevel handler in borg.archiver
or a command handler returning with some error code.
Just wanted to ask if somebody wants to either work on this or put a bounty on it?
Due to compatibility reasons I guess the best point to introduce such a change would be with borg 2.0 (which is a breaking change anyway).
But for that to happen, work on this would have to be done rather soon or it won't be in 2.0.
I'ld recommend doing this in this order:
ERRORs are rather simple, WARNINGs more difficult / unclear yet how to best deal with.
ERRORs are rather simple
I think you have to walk through the relevant parts of the code base and specify the return code for a the different problems resulting in an error. Or is a their a way to easily map existing exceptions and message ids to return codes in one place?
@real-yfprojects yes. it's not necessarily an exception, can be also just an if
+ return
.
Are there plans to have a own RC for the borg create message "file changed while we backed it up", which comes now up with RC1?
@mai-schlau that is a warning (not: error) message and considering there might be multiple and different warnings (plus potentially even some real error), this is a more difficult case than the "real error" return codes and IIRC we did not discuss yet how to handle that.
As we have multiple systems, which produce this "file changed while we backed it up" and which still have to be backuped it would be very helpful if these messages would produce a different RC than 1. So it would be easier and more handy to distinguish this occurrence from others like 'no such path' which in our environment requests action.
But maybe, and this is only a simple suggestion, something like RC 1-1 for the "changed while" and a simple 1 for the rest would be an approach. Further RC1s like 'no such path' could also come up with RC 1-2 and maybe the list could be filled till 1-9.
Especially the RC1-1 for the "file changed while we backed it up", would make the messages and our monitoring a lot clearer ...
As there can be multiple warnings in a borg run, it is difficult to map all this information to a few bits in a 8bit return code.
But what we could do is:
error
to its return code (> 0).def compute_rc(error, warnings):
if error > 0:
# there was a fatal error, return its error code
return error
if not warnings:
# great: no error and also no warnings
return 0
rcs = set(warnings)
if len(rcs) == 1:
# easy: there was only one kind of warning, so we can be specific
return rcs[0]
# there were different kinds of warnings
return 1 # generic warning rc, user has to look into the logs
We could think about whether it makes sense to have the same rc for different kinds of similar warnings, so that the specific warning rc case in that computation is used more frequently than the fallback to rc 1 ("find it out yourself!")
Couldn't see this being mentioned before: If any exit codes are added, I would strongly advice to not use exit codes that might appear independent of borg. Especially 128+n for signals like SEGFAULT.
@sophie-h sure, only 3..127 will be used (and 0, 1, 2 when nothing more specific is known).
PR #7928 is not quite finished, but guess some early review / feedback would be helpful!
I did a PR against 1.2-maint, but guess that needs some good review and testing before we can decide whether it can go into 1.2.x releases - the change got a bit bigger than I thought...
Update:
I suspect that this has a slight potential to either break existing scripts or have some new bugs in the way errors and warnings are dealt with internally in borg - so maybe this is not suitable for a 1.2.x patch release which is expected to be rather stable and boring.
1.3.0*
has some tags in the git repo, but that was only an early attempt of what later became borg 2.0.0*
, so guess we better ignore/skip that version number to not confuse users or developers.
Thus, borg 1.4.0
could be the next near-future and rather stable release (borg2 will still take quite a while) and we could have some alpha / beta / rc phases first before releasing that. The increment in the minor number might also motivate more people to look into the changelog and maybe adapt their scripts for the new return codes. We could even use the new return codes by default, not sure.
What do you think about that?
Related: #7080 (msgid for file changed warning)
Guess everybody was too busy to comment, so I just went ahead and put this into first borg 1.4.0 milestone.
borg 1.4 fixed by PR #7976 - this definitely will need practical testing!
borg2 / master branch now got all the changes forward-ported, too.
Currently, borg has only a few, rather generic return codes: 0 ok, 1 warning, 2 error, >=128 for termination due to signals.
This could be improved (preferably with a breaking release) by moving some stuff to more specific codes. Everything that has no specific code assigned stays at the current code.
First task is to go through the issue tracker and link to all issues talking about (more specific) error codes from here.
Then it needs some planning: what makes sense, what does not, ...
When planning is finished and there is some consensus, implementation can follow.