Open jhjacobs81 opened 1 year ago
Could you please assign this issue to me?
@shalearkane you have it! thanks for helping!
I am facing a few problems while trying to find out a solution:
To reproduce the exact error, I ran setup.py
on 1.2-maint
branch and then borg create
however even though setup.py ran successfully, borg is showing the following errors :
Traceback (most recent call last):
File ".../borg/.venv/lib/python3.10/site-packages/borgbackup-1.2.4.dev21+g5ccfaa7a-py3.10-linux-x86_64.egg/borg/archiver/__init__.py", line 29, in <module>
from ..helpers import remove_surrogates, text_to_json
ImportError: cannot import name 'text_to_json' from 'borg.helpers' (.../borg/.venv/lib/python3.10/site-packages/borgbackup-1.2.4.dev21+g5ccfaa7a-py3.10-linux-x86_64.egg/borg/helpers/__init__.py)
I read the documention from here that all json logs go to stderr which I confirmed by going through the code in the master branch :
print(json.dumps(kwargs), file=sys.stderr, flush=True)
wherever output_json
is set. Since every json output goes to stderr, I did not understand the error being reported. Also that the json output is supposed to redirected to an output as mentioned here. I ran borg create archive_name /archive --log-json --show-rc 2>>borglog.json
and the rc output is successfully being redirected to borglog.json
. I think I am missing some argument which outputs the logs to a file.
I found the part of the code responsible for printing out the rc in src/borg/archiver/__init__.py
lines 638-651 :
if args.show_rc:
rc_logger = logging.getLogger("borg.output.show-rc")
exit_msg = "terminating with %s status, rc %d"
if exit_code == EXIT_SUCCESS:
rc_logger.info(exit_msg % ("success", exit_code))
elif exit_code == EXIT_WARNING:
rc_logger.warning(exit_msg % ("warning", exit_code))
elif exit_code == EXIT_ERROR:
rc_logger.error(exit_msg % ("error", exit_code))
elif exit_code >= EXIT_SIGNAL_BASE:
rc_logger.error(exit_msg % ("signal", exit_code))
else:
rc_logger.error(exit_msg % ("abnormal", exit_code or 666))
sys.exit(exit_code)
However after going through the code, I am not finding any difference with the rest of the loggers in other parts of the codebase except that this does not handle the case if args.log_json
is set or not.
You should develop new features on master branch usually. If you switch branches, always do:
python setup.py clean clean2 ; pip install -e .
@jhjacobs81 please give an example borg command and an example output that clarifies what you mean and what's missing.
ola :)
the command is like so:
borg create --show-rc --json --verbose --filter AME --list --stats --compression lz4 --exclude-caches ::'{hostname}-{now}' /var/www >> /var/log/borg/`date "+%Y%m%d"`-backup.log
the /var/log/borg/backup-20230209.log contains this:
{
"archive": {
"command_line": [
"/usr/bin/borg",
"create",
"--show-rc",
"--json",
"--verbose",
"--filter",
"AME",
"--list",
"--stats",
"--compression",
"lz4",
"--exclude-caches",
"::{hostname}-{now}",
"/var/www"
],
"duration": 904.844292,
"end": "2023-02-09T04:15:11.000000",
"id": "b7c8b02f38799e593446cd23b202a2cd2804ec4cbfedde92e6212277b1a3abcd",
"limits": {
"max_archive_size": 0.0090983091845835
},
"name": "server-2023-02-09T04:00:02",
"start": "2023-02-09T04:00:06.000000",
"stats": {
"compressed_size": 6755887201823,
"deduplicated_size": 7896542647,
"nfiles": 2506135,
"original_size": 6915780620800
}
},
"cache": {
"path": "/home/borg/.cache/borg/8b7c8f3f50a639069ec9d19f3a675382fceac7132fec70be5b28fecdd746359c",
"stats": {
"total_chunks": 1330448735,
"total_csize": 1879973766849410,
"total_size": 1925944005517958,
"total_unique_chunks": 5187095,
"unique_csize": 6647323823318,
"unique_size": 6868921146610
}
},
"encryption": {
"mode": "repokey"
},
"repository": {
"id": "8b7c8f3f50a639069ec9d19f3a675382fceac7132fec70be5b28fecdd746359c",
"last_modified": "2023-02-09T04:15:11.000000",
"location": "ssh://<REPOSITORY>>"
}
}
i would expect: the archive part of this json to include the status of the backup. like a field that says
{
"archive": {
"command_line": [
"/usr/bin/borg",
"create",
"--show-rc",
"--json",
"--verbose",
"--filter",
"AME",
"--list",
"--stats",
"--compression",
"lz4",
"--exclude-caches",
"::{hostname}-{now}",
"/var/www"
],
"duration": 904.844292,
"end": "2023-02-09T04:15:11.000000",
"id": "b7c8b02f38799e593446cd23b202a2cd2804ec4cbfedde92e6212277b1a3abcd",
"limits": {
"max_archive_size": 0.0090983091845835
},
"name": "server-2023-02-09T04:00:02",
"start": "2023-02-09T04:00:06.000000",
"stats": {
"compressed_size": 6755887201823,
"deduplicated_size": 7896542647,
"nfiles": 2506135,
"original_size": 6915780620800,
"status": 0
}
},
"cache": {
"path": "/home/borg/.cache/borg/8b7c8f3f50a639069ec9d19f3a675382fceac7132fec70be5b28fecdd746359c",
"stats": {
"total_chunks": 1330448735,
"total_csize": 1879973766849410,
"total_size": 1925944005517958,
"total_unique_chunks": 5187095,
"unique_csize": 6647323823318,
"unique_size": 6868921146610
}
},
"encryption": {
"mode": "repokey"
},
"repository": {
"id": "8b7c8f3f50a639069ec9d19f3a675382fceac7132fec70be5b28fecdd746359c",
"last_modified": "2023-02-09T04:15:11.000000",
"location": "ssh://<REPOSITORY>>"
}
}
or something. (below stats i added an extra json field)
is it clear what i mean? :)
Yes, I am able to reproduce the issue, thanks :smile:
@ThomasWaldmann thanks for the heads-up about the python setup.py clean clean2 ; pip install -e
. Now everything is working perfectly.
However, I have some queries about the feature request.
The json_output is first printed in https://github.com/borgbackup/borg/blob/366731ba004e69e6c479a6ce2b5ab1fdbbed3e4f/src/borg/archiver/create_cmd.py#L205-L206
and then the rc code (exit_code
) is calculated in the lines https://github.com/borgbackup/borg/blob/366731ba004e69e6c479a6ce2b5ab1fdbbed3e4f/src/borg/archiver/__init__.py#L596-L633
So, if I have to add the status_code in archive.stats
, I will have to add a class property status_code
in https://github.com/borgbackup/borg/blob/366731ba004e69e6c479a6ce2b5ab1fdbbed3e4f/src/borg/archive.py#L56-L66
And set
archive.stats.status_code = self.exit_code
But then the status_code set will not be the same as the output of --show-rc
flag because --show-rc
flag also takes into account other exceptions like KeyboardInterrupt, Sigterm, etc. So the other (correct) solution to this is to add
if args.json:
json_print({"status_code" : exit_code})
But the downside to this is that the json output produced will be incorrect since we will be appending two json (first one from the create_cmd
) in one file. So parsing will be difficult.
Guess we have to differentiate:
Archiver.exit_code = 1
(WARNING) somewhere and maybe skips a problematic file, but otherwise continues normally, normal execution flow. When we emit some json, we can also know about that "1" at that point.Alright, I have two approaches to this
exit_code
to the Statistics Class
and add exit_code
to the as_dict
functionjson_print
, I set the exit_code
before printing :
exit_code
in rinfo_cmd
since there is no Archive
object
https://github.com/borgbackup/borg/blob/ee7fee75fc13d19688a2d5101318402db91876e7/src/borg/archiver/rinfo_cmd.py#L19-L22exit_code
in rlist_cmd
since there is no Archive
object. The manifest object is used to print the data.
https://github.com/borgbackup/borg/blob/ee7fee75fc13d19688a2d5101318402db91876e7/src/borg/archiver/rlist_cmd.py#L29-L38EXIT_ERROR
and EXIT_SIGNAL_BASE
, I output a different json.
info_cmd
output will have exit_code
since by default it gets the calculated statistics from https://github.com/borgbackup/borg/blob/ee7fee75fc13d19688a2d5101318402db91876e7/src/borg/archive.py#L592 , while I feel will be an useless output
exit_code
to the extra
argumentin https://github.com/borgbackup/borg/blob/366731ba004e69e6c479a6ce2b5ab1fdbbed3e4f/src/borg/archiver/create_cmd.py#L205-L206 ... and everywhere else as necessary
EXIT_ERROR
and EXIT_SIGNAL_BASE
, I output a different json.
We won't have the exit_code
in the extra.archive.stats
part of the json, rather in the top-levelextra
part.
Guess I would not really like to have that in the statistics data (as it isn't statistics).
Maybe you could explore something else:
The --show-rc
usually gets the return code logged (in the last moment before exiting).
How does it behave when --log-json
is given also?
My apologies, ive been fairly busy the past days.
—log-json is different then the —json we already include? I will have to try this.
While its not a statistic perse, it can be used as such. I could imagine that the amount of failed or warnings can be a statistic in itself. And could be used as a metric to consider changing the location or whatever is the problem.
Truth be told, i thought it was an easy change to make.
There is already some stats tracking the per file status (like "A"dded, "M"odified, "E"rror, "U"nchanged).
Yes, --log-json is different from --json, just try it.
Yes, --log-json
is different from --json
.
I ran
borg -r ~/Public/borg/backup create --show-rc --json --log-json --verbose --list --stats ar11 requirements.d/ >> borglog.json
--json
outputs to borglog.json
{
"archive": {
"command_line": ".../borg/.venv/bin/borg -r /.../borg/backup create --show-rc --json --log-json --verbose --list --stats ar11 requirements.d/",
"duration": 0.004463,
"end": "2023-02-15T03:20:58.400617+05:30",
"id": "4e8f36a208f5ae5c6959c53a2b46869ca5f6edfd95d034ba2f2dbcfb24e33a46",
"name": "ar11",
"start": "2023-02-15T03:20:58.396154+05:30",
"stats": {
"chunking_time": 0.0003651460283435881,
"deduplicated_size": 566,
"files_stats": {
"A": 3,
"U": 2,
"d": 1
},
"hashing_time": 4.223800351610407e-05,
"nfiles": 5,
"original_size": 1112,
"status_code": 0
}
},
"cache": {
"path": ".../.cache/borg/baa2362e1f95478d0d1fb11c254e51793a8710c23212904f584798ff614c1f0c",
"stats": {
"total_chunks": 152,
"total_size": 10374,
"total_unique_chunks": 26,
"unique_size": 12201
}
},
"encryption": {
"mode": "none"
},
"repository": {
"id": "baa2362e1f95478d0d1fb11c254e51793a8710c23212904f584798ff614c1f0c",
"last_modified": "2023-02-15T03:20:58.401461+05:30",
"location": ".../borg/backup"
}
}
while,
--log-json
outputs to stderr{"type": "log_message", "time": 1676411458.3961985, "message": "Creating archive at \"/home/.../borg/backup\"", "levelname": "INFO", "name": "borg.archiver.create_cmd"}
{"type": "file_status", "status": "A", "path": "requirements.d/development.lock.txt"}
{"type": "file_status", "status": "A", "path": "requirements.d/development.txt"}
{"type": "file_status", "status": "U", "path": "requirements.d/coala.txt"}
{"type": "file_status", "status": "U", "path": "requirements.d/docs.txt"}
{"type": "file_status", "status": "A", "path": "requirements.d/codestyle.txt"}
{"type": "file_status", "status": "d", "path": "requirements.d"}
{"type": "log_message", "time": 1676411458.4419532, "message": "terminating with success status, rc 0", "levelname": "INFO", "name": "borg.output.show-rc"}
and this does not get redirected to borglog.json
{"type": "log_message", "time": 1676411458.4419532, "message": "terminating with success status, rc 0", "levelname": "INFO", "name": "borg.output.show-rc"}
OK, so we actually do already out the json log line corresponding to the non-json log output triggered by --show-rc.
So, any tool preferring json over standard log output could use --log-json and get the rc (and also other log output) from there.
It's not super convenient from a json perspective as parsing the rc code from that line needs a bit more work than just having a direct json value, but as the --log-json is generic code, this is somehow expected.
That output is on stderr, so one needs 2> json.log
to redirect it.
@jhjacobs81 looks like that solves the problem. ^
Due to the execution flow, especially in case of exceptions and rc 2, it is difficult to have the rc code in --json
, but it would be in --log-json
.
whoops, i will have to rewrite all the code, but alas! thats the way it is :)
Have you checked borgbackup docs, FAQ, and open GitHub issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
QUESTION
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
borg 1.2.0
Describe the problem you're observing.
The return code (using --show-rc on the command line) goes to standard error, instead of the json output of the rest of the logs. This makes parsing the logs needlessly difficult.
Idealy, the rc output should be in the same json output (which we log to borglog.json) so we can use it for further processing without having to create work arounds.