borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.18k stars 742 forks source link

add borg returncode to the json output? #7318

Open jhjacobs81 opened 1 year ago

jhjacobs81 commented 1 year ago

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

QUESTION

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.2.0

Describe the problem you're observing.

The return code (using --show-rc on the command line) goes to standard error, instead of the json output of the rest of the logs. This makes parsing the logs needlessly difficult.

Idealy, the rc output should be in the same json output (which we log to borglog.json) so we can use it for further processing without having to create work arounds.

shalearkane commented 1 year ago

Could you please assign this issue to me?

ThomasWaldmann commented 1 year ago

@shalearkane you have it! thanks for helping!

shalearkane commented 1 year ago

I am facing a few problems while trying to find out a solution:

  1. To reproduce the exact error, I ran setup.py on 1.2-maint branch and then borg create however even though setup.py ran successfully, borg is showing the following errors :

    Traceback (most recent call last):
    File ".../borg/.venv/lib/python3.10/site-packages/borgbackup-1.2.4.dev21+g5ccfaa7a-py3.10-linux-x86_64.egg/borg/archiver/__init__.py", line 29, in <module>
    from ..helpers import remove_surrogates, text_to_json
    ImportError: cannot import name 'text_to_json' from 'borg.helpers' (.../borg/.venv/lib/python3.10/site-packages/borgbackup-1.2.4.dev21+g5ccfaa7a-py3.10-linux-x86_64.egg/borg/helpers/__init__.py)
  2. I read the documention from here that all json logs go to stderr which I confirmed by going through the code in the master branch :

    print(json.dumps(kwargs), file=sys.stderr, flush=True)

    wherever output_json is set. Since every json output goes to stderr, I did not understand the error being reported. Also that the json output is supposed to redirected to an output as mentioned here. I ran borg create archive_name /archive --log-json --show-rc 2>>borglog.json and the rc output is successfully being redirected to borglog.json. I think I am missing some argument which outputs the logs to a file.

  3. I found the part of the code responsible for printing out the rc in src/borg/archiver/__init__.py lines 638-651 :

    if args.show_rc:
      rc_logger = logging.getLogger("borg.output.show-rc")
      exit_msg = "terminating with %s status, rc %d"
      if exit_code == EXIT_SUCCESS:
          rc_logger.info(exit_msg % ("success", exit_code))
      elif exit_code == EXIT_WARNING:
          rc_logger.warning(exit_msg % ("warning", exit_code))
      elif exit_code == EXIT_ERROR:
          rc_logger.error(exit_msg % ("error", exit_code))
      elif exit_code >= EXIT_SIGNAL_BASE:
          rc_logger.error(exit_msg % ("signal", exit_code))
      else:
          rc_logger.error(exit_msg % ("abnormal", exit_code or 666))
    sys.exit(exit_code)

    However after going through the code, I am not finding any difference with the rest of the loggers in other parts of the codebase except that this does not handle the case if args.log_json is set or not.

ThomasWaldmann commented 1 year ago

You should develop new features on master branch usually. If you switch branches, always do:

python setup.py clean clean2 ; pip install -e .
ThomasWaldmann commented 1 year ago

@jhjacobs81 please give an example borg command and an example output that clarifies what you mean and what's missing.

jhjacobs81 commented 1 year ago

ola :)

the command is like so:

borg create --show-rc --json --verbose --filter AME --list --stats --compression lz4 --exclude-caches ::'{hostname}-{now}' /var/www >> /var/log/borg/`date "+%Y%m%d"`-backup.log

the /var/log/borg/backup-20230209.log contains this:

{
    "archive": {
        "command_line": [
            "/usr/bin/borg",
            "create",
            "--show-rc",
            "--json",
            "--verbose",
            "--filter",
            "AME",
            "--list",
            "--stats",
            "--compression",
            "lz4",
            "--exclude-caches",
            "::{hostname}-{now}",
            "/var/www"
        ],
        "duration": 904.844292,
        "end": "2023-02-09T04:15:11.000000",
        "id": "b7c8b02f38799e593446cd23b202a2cd2804ec4cbfedde92e6212277b1a3abcd",
        "limits": {
            "max_archive_size": 0.0090983091845835
        },
        "name": "server-2023-02-09T04:00:02",
        "start": "2023-02-09T04:00:06.000000",
        "stats": {
            "compressed_size": 6755887201823,
            "deduplicated_size": 7896542647,
            "nfiles": 2506135,
            "original_size": 6915780620800
        }
    },
    "cache": {
        "path": "/home/borg/.cache/borg/8b7c8f3f50a639069ec9d19f3a675382fceac7132fec70be5b28fecdd746359c",
        "stats": {
            "total_chunks": 1330448735,
            "total_csize": 1879973766849410,
            "total_size": 1925944005517958,
            "total_unique_chunks": 5187095,
            "unique_csize": 6647323823318,
            "unique_size": 6868921146610
        }
    },
    "encryption": {
        "mode": "repokey"
    },
    "repository": {
        "id": "8b7c8f3f50a639069ec9d19f3a675382fceac7132fec70be5b28fecdd746359c",
        "last_modified": "2023-02-09T04:15:11.000000",
        "location": "ssh://<REPOSITORY>>"
    }
}

i would expect: the archive part of this json to include the status of the backup. like a field that says

{
    "archive": {
        "command_line": [
            "/usr/bin/borg",
            "create",
            "--show-rc",
            "--json",
            "--verbose",
            "--filter",
            "AME",
            "--list",
            "--stats",
            "--compression",
            "lz4",
            "--exclude-caches",
            "::{hostname}-{now}",
            "/var/www"
        ],
        "duration": 904.844292,
        "end": "2023-02-09T04:15:11.000000",
        "id": "b7c8b02f38799e593446cd23b202a2cd2804ec4cbfedde92e6212277b1a3abcd",
        "limits": {
            "max_archive_size": 0.0090983091845835
        },
        "name": "server-2023-02-09T04:00:02",
        "start": "2023-02-09T04:00:06.000000",
        "stats": {
            "compressed_size": 6755887201823,
            "deduplicated_size": 7896542647,
            "nfiles": 2506135,
            "original_size": 6915780620800,
            "status": 0
        }
    },
    "cache": {
        "path": "/home/borg/.cache/borg/8b7c8f3f50a639069ec9d19f3a675382fceac7132fec70be5b28fecdd746359c",
        "stats": {
            "total_chunks": 1330448735,
            "total_csize": 1879973766849410,
            "total_size": 1925944005517958,
            "total_unique_chunks": 5187095,
            "unique_csize": 6647323823318,
            "unique_size": 6868921146610
        }
    },
    "encryption": {
        "mode": "repokey"
    },
    "repository": {
        "id": "8b7c8f3f50a639069ec9d19f3a675382fceac7132fec70be5b28fecdd746359c",
        "last_modified": "2023-02-09T04:15:11.000000",
        "location": "ssh://<REPOSITORY>>"
    }
}

or something. (below stats i added an extra json field)

is it clear what i mean? :)

shalearkane commented 1 year ago

Yes, I am able to reproduce the issue, thanks :smile:

shalearkane commented 1 year ago

@ThomasWaldmann thanks for the heads-up about the python setup.py clean clean2 ; pip install -e. Now everything is working perfectly.

However, I have some queries about the feature request.

The json_output is first printed in https://github.com/borgbackup/borg/blob/366731ba004e69e6c479a6ce2b5ab1fdbbed3e4f/src/borg/archiver/create_cmd.py#L205-L206 and then the rc code (exit_code) is calculated in the lines https://github.com/borgbackup/borg/blob/366731ba004e69e6c479a6ce2b5ab1fdbbed3e4f/src/borg/archiver/__init__.py#L596-L633

So, if I have to add the status_code in archive.stats, I will have to add a class property status_code in https://github.com/borgbackup/borg/blob/366731ba004e69e6c479a6ce2b5ab1fdbbed3e4f/src/borg/archive.py#L56-L66 And set

archive.stats.status_code = self.exit_code

in https://github.com/borgbackup/borg/blob/366731ba004e69e6c479a6ce2b5ab1fdbbed3e4f/src/borg/archiver/create_cmd.py#L202-L208

But then the status_code set will not be the same as the output of --show-rc flag because --show-rc flag also takes into account other exceptions like KeyboardInterrupt, Sigterm, etc. So the other (correct) solution to this is to add

if args.json:
    json_print({"status_code" : exit_code})

in https://github.com/borgbackup/borg/blob/366731ba004e69e6c479a6ce2b5ab1fdbbed3e4f/src/borg/archiver/__init__.py#L638-L650

But the downside to this is that the json output produced will be incorrect since we will be appending two json (first one from the create_cmd) in one file. So parsing will be difficult.

ThomasWaldmann commented 1 year ago

Guess we have to differentiate:

shalearkane commented 1 year ago

Alright, I have two approaches to this

1. Add exit_code to the Statistics Class and add exit_code to the as_dict function

in https://github.com/borgbackup/borg/blob/ee7fee75fc13d19688a2d5101318402db91876e7/src/borg/archive.py#L117-L125

ThomasWaldmann commented 1 year ago

Guess I would not really like to have that in the statistics data (as it isn't statistics).

Maybe you could explore something else:

The --show-rc usually gets the return code logged (in the last moment before exiting).

How does it behave when --log-json is given also?

jhjacobs81 commented 1 year ago

My apologies, ive been fairly busy the past days.

—log-json is different then the —json we already include? I will have to try this.

While its not a statistic perse, it can be used as such. I could imagine that the amount of failed or warnings can be a statistic in itself. And could be used as a metric to consider changing the location or whatever is the problem.

Truth be told, i thought it was an easy change to make.

ThomasWaldmann commented 1 year ago

There is already some stats tracking the per file status (like "A"dded, "M"odified, "E"rror, "U"nchanged).

Yes, --log-json is different from --json, just try it.

shalearkane commented 1 year ago

Yes, --log-json is different from --json. I ran

borg -r ~/Public/borg/backup create --show-rc --json --log-json --verbose --list --stats ar11 requirements.d/  >> borglog.json

--json outputs to borglog.json

{
    "archive": {
        "command_line": ".../borg/.venv/bin/borg -r /.../borg/backup create --show-rc --json --log-json --verbose --list --stats ar11 requirements.d/",
        "duration": 0.004463,
        "end": "2023-02-15T03:20:58.400617+05:30",
        "id": "4e8f36a208f5ae5c6959c53a2b46869ca5f6edfd95d034ba2f2dbcfb24e33a46",
        "name": "ar11",
        "start": "2023-02-15T03:20:58.396154+05:30",
        "stats": {
            "chunking_time": 0.0003651460283435881,
            "deduplicated_size": 566,
            "files_stats": {
                "A": 3,
                "U": 2,
                "d": 1
            },
            "hashing_time": 4.223800351610407e-05,
            "nfiles": 5,
            "original_size": 1112,
            "status_code": 0
        }
    },
    "cache": {
        "path": ".../.cache/borg/baa2362e1f95478d0d1fb11c254e51793a8710c23212904f584798ff614c1f0c",
        "stats": {
            "total_chunks": 152,
            "total_size": 10374,
            "total_unique_chunks": 26,
            "unique_size": 12201
        }
    },
    "encryption": {
        "mode": "none"
    },
    "repository": {
        "id": "baa2362e1f95478d0d1fb11c254e51793a8710c23212904f584798ff614c1f0c",
        "last_modified": "2023-02-15T03:20:58.401461+05:30",
        "location": ".../borg/backup"
    }
}

while,

--log-json outputs to stderr

{"type": "log_message", "time": 1676411458.3961985, "message": "Creating archive at \"/home/.../borg/backup\"", "levelname": "INFO", "name": "borg.archiver.create_cmd"}
{"type": "file_status", "status": "A", "path": "requirements.d/development.lock.txt"}
{"type": "file_status", "status": "A", "path": "requirements.d/development.txt"}
{"type": "file_status", "status": "U", "path": "requirements.d/coala.txt"}
{"type": "file_status", "status": "U", "path": "requirements.d/docs.txt"}
{"type": "file_status", "status": "A", "path": "requirements.d/codestyle.txt"}
{"type": "file_status", "status": "d", "path": "requirements.d"}
{"type": "log_message", "time": 1676411458.4419532, "message": "terminating with success status, rc 0", "levelname": "INFO", "name": "borg.output.show-rc"}

and this does not get redirected to borglog.json

ThomasWaldmann commented 1 year ago
{"type": "log_message", "time": 1676411458.4419532, "message": "terminating with success status, rc 0", "levelname": "INFO", "name": "borg.output.show-rc"}

OK, so we actually do already out the json log line corresponding to the non-json log output triggered by --show-rc.

So, any tool preferring json over standard log output could use --log-json and get the rc (and also other log output) from there.

It's not super convenient from a json perspective as parsing the rc code from that line needs a bit more work than just having a direct json value, but as the --log-json is generic code, this is somehow expected.

That output is on stderr, so one needs 2> json.log to redirect it.

@jhjacobs81 looks like that solves the problem. ^

Due to the execution flow, especially in case of exceptions and rc 2, it is difficult to have the rc code in --json, but it would be in --log-json.

jhjacobs81 commented 1 year ago

whoops, i will have to rewrite all the code, but alas! thats the way it is :)