Docs suggestion: pydantic models for borg's JSON output

a-gn commented 4 weeks ago

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

It's an ISSUE (suggestion for docs improvement actually). I don't think system info is needed.

Your borg version (borg -V).

1.4.0.

Describe the problem you're observing.

I wrote a small borg automation project for myself. Parsing borg's JSON output with pydantic was a bit of a pain because I had to write models for this by hand.

I suggest adding these Pydantic v2 models to the same docs, to make it easier to write frontends:

import json
import logging
import typing
from datetime import datetime
from pathlib import Path

import pydantic

_log = logging.getLogger(__name__)

class BaseBorgLogLine(pydantic.BaseModel):
    def get_level(self) -> int:
        """Get the log level for this line as a `logging` level value.

        If this is a log message with a levelname, use it.
        Otherwise, progress messages get `DEBUG` level, and other messages get `INFO`.
        """
        return logging.DEBUG

class ArchiveProgressLogLine(BaseBorgLogLine):
    original_size: int
    compressed_size: int
    deduplicated_size: int
    nfiles: int
    path: Path
    time: float

class FinishedArchiveProgress(BaseBorgLogLine):
    """JSON object printed on stdout when an archive is finished."""

    time: float
    type: typing.Literal["archive_progress"]
    finished: bool

class ProgressMessage(BaseBorgLogLine):
    operation: int
    msgid: typing.Optional[str]
    finished: bool
    message: typing.Optional[str]
    time: float

class ProgressPercent(BaseBorgLogLine):
    operation: int
    msgid: str | None = pydantic.Field(None)
    finished: bool
    message: str | None = pydantic.Field(None)
    current: float | None = pydantic.Field(None)
    info: list[str] | None = pydantic.Field(None)
    total: float | None = pydantic.Field(None)
    time: float

    @pydantic.model_validator(mode="after")
    def fields_depending_on_finished(self) -> typing.Self:
        if self.finished:
            if self.message is not None:
                raise ValueError("message must be None if finished is True")
            if self.current != self.total:
                raise ValueError("current must be equal to total if finished is True")
            if self.info is not None:
                raise ValueError("info must be None if finished is True")
            if self.total is not None:
                raise ValueError("total must be None if finished is True")
        else:
            if self.message is None:
                raise ValueError("message must not be None if finished is False")
            if self.current is None:
                raise ValueError("current must not be None if finished is False")
            if self.info is None:
                raise ValueError("info must not be None if finished is False")
            if self.total is None:
                raise ValueError("total must not be None if finished is False")
        return self

class FileStatus(BaseBorgLogLine):
    status: str
    path: Path

class LogMessage(BaseBorgLogLine):
    time: float
    levelname: typing.Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]
    name: str
    message: str
    msgid: typing.Optional[str]

    def get_level(self) -> int:
        try:
            return getattr(logging, self.levelname)
        except AttributeError:
            _log.warning(
                "could not find log level %s, giving the following message WARNING level: %s",
                self.levelname,
                json.dumps(self),
            )
            return logging.WARNING

_BorgLogLinePossibleTypes = (
    ArchiveProgressLogLine
    | FinishedArchiveProgress
    | ProgressMessage
    | ProgressPercent
    | FileStatus
    | LogMessage
)

class BorgLogLine(pydantic.RootModel[_BorgLogLinePossibleTypes]):
    """A log line from Borg with the `--log-json` argument."""

    def get_level(self) -> int:
        return self.root.get_level()

class _BorgArchive(pydantic.BaseModel):
    """Basic archive attributes."""

    name: str
    id: str
    start: datetime

class _BorgArchiveStatistics(pydantic.BaseModel):
    """Statistics of an archive."""

    original_size: int
    compressed_size: int
    deduplicated_size: int
    nfiles: int

class _BorgLimitUsage(pydantic.BaseModel):
    """Usage of borg limits by an archive."""

    max_archive_size: float

class _BorgDetailedArchive(_BorgArchive):
    """Archive attributes, as printed by `json info` or `json create`."""

    end: datetime
    duration: float
    stats: _BorgArchiveStatistics
    limits: _BorgLimitUsage
    command_line: typing.List[str]
    chunker_params: typing.Any | None = None

class BorgCreateResult(pydantic.BaseModel):
    """JSON object printed at the end of `borg create`."""

    archive: _BorgDetailedArchive

class BorgListResult(pydantic.BaseModel):
    """JSON object printed at the end of `borg list`."""

    archives: typing.List[_BorgArchive]

I think they are correct, I can parse all of borg's outputs in my runs.

Let me know if this is out of scope here and I should suggest it somewhere else :)

ThomasWaldmann commented 4 weeks ago

Interesting idea, but I'ld rather would like to have them in the code and also have unit tests that they actually work. So we'll know when something breaks.

I don't use pedantic myself, but I'ld review a PR for such an addition.

a-gn commented 3 weeks ago

Are you saying you'd prefer them to be importable from borg's code? I'm asking because the docs say that the internals aren't stable and that users should use the CLI. Or do you just mean that they should be in the code for tests, and automatically copied in the docs too?

ThomasWaldmann commented 3 weeks ago

Yeah, internal apis are not stable. Guess not even the JSON is not fully stable, there might be quite some changes coming in borg2...

But we could have the models in the code and tests that they actually work for the current version.

RonnyPfannschmidt commented 3 weeks ago

It might be a good reason to introduce private naming and public naming

a-gn commented 3 weeks ago

It might be a good reason to introduce private naming and public naming

I'd suggest having a very very small public stable API, and the models could be in a module per version? from borg.public.json_models.v1 import BorgLogLine, from borg.public.json_models.v2 import BorgLogLine. Maybe later with small util types that expose information in Python with a common API.

a-gn commented 2 weeks ago

I opened a pull request there; I can add more tests if you validate those.

I'm on macOS and don't use homebrew so installing borg here is kind of a mess (missing pkg-config). I will try later.

borgbackup / borg