aiidateam / disk-objectstore

An implementation of an efficient "object store" (actually, a key-value store) writing files on disk and not requiring a running server
https://disk-objectstore.readthedocs.io
MIT License
16 stars 8 forks source link

Backup tools: rsync progress; logging change #166

Closed eimrek closed 7 months ago

eimrek commented 7 months ago

Hi,

this PR contains two changes:

1) remove the logger argument from BackupManager, and instead only use the logger object of this library.

This change is motivated by the fact that in aiida-core, the default logging level is REPORT (23) and the logger.info messages of this library are not outputted, which give information about the progress and should probably be shown by default. After these changes, the logger instead should be controlled in the aiida-core via something like

    import logging
    from aiida.common.log import LOG_LEVEL_REPORT
    DOS_BACKUP_LOGGER = logging.getLogger("disk_objectstore.backup_utils")
    # As the default logging level in AiiDA is a custom level REPORT (23), adapt
    # the disk-objectstore logger to output the INFO (20) level by default as well
    storage_logger_level = STORAGE_LOGGER.getEffectiveLevel()
    if storage_logger_level  >= logging.INFO and storage_logger_level <= LOG_LEVEL_REPORT:
        DOS_BACKUP_LOGGER.setLevel(logging.INFO)
    else:
        DOS_BACKUP_LOGGER.setLevel(storage_logger_level)

    handler = logging.StreamHandler()
    formatter = logging.Formatter('%(name)s: [%(levelname)s] %(message)s')
    handler.setFormatter(formatter)

    DOS_BACKUP_LOGGER.addHandler(handler)

Although, perhaps there is a better way to do it, similarly to the other loggers (as is done in aiida.common.log.py).

In principle, I can also revert to using the logger argument, but then i probably should also create a custom log level called REPORT in this library, if i want to log consistently with aiida-core.

2) Enable rsync progress indication (with --info=progress2) and output the stdout directly to the user, without capturing it

If i call subprocess.run(rsync_args, capture_output=False), the stdout of rsync is passed directly from the subprocess to the user, bypassing any logging system. Although this has drawbacks (e.g. it won't show up in logs, e.g. when redirecting to a file), it allows to show the updating progress indicator to the user running the CLI sees. The output is captured and not shown, however, if the logging level is above INFO. The motivation behind why I just bypass capturing the output is that if the stdout is captured, it's not possible (i think; or at least not without writing a lot of code) to output a in-place updating progress indicator via the logging system.

As this has the drawback, I think it would be good to discuss and get opinions (@giovannipizzi @sphuber).

Two possible alternatives would be

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (23c784a) 99.90% compared to head (34bb0de) 99.90%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #166 +/- ## ======================================= Coverage 99.90% 99.90% ======================================= Files 10 10 Lines 2066 2085 +19 ======================================= + Hits 2064 2083 +19 Misses 2 2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

giovannipizzi commented 7 months ago

I think that not capturing the output of rsync is OK as the best solution among those you describe, but happy to hear if @sphuber also agrees or not

eimrek commented 7 months ago

@sphuber This should be good now, give it a quick check if you have time.

sphuber commented 7 months ago

To summarize the discussion @eimrek and myself had: ideally the progress of rsync also goes through the logging system as then it can be fully controlled by the caller, e.g. aiida-core. However, the current progress reporting mechanism in aiida-core doesn't even go through the logging, so it doens't make sense to invest the time now to do it here. So for now, we will go simply with letting rsync's progress go uncaptured straight to stdout