Scheduling for secondary profile is ignored: set to run every 4 days; runs every 15 minutes.

danielaixer commented 1 year ago

Ubuntu MATE 22.04 BackInTime 1.3.3-3 (same issue on BIT 1.2.1-3)

I've created a second profile that appears as "profile2.name=Docs" in the config file.

This is the secondary profile scheduling settings: imagen

These are the schedule related entries that are generated on "~/.config/backintime/config":

profile2.schedule.custom_time=8,12,18,23
profile2.schedule.day=1
profile2.schedule.mode=25
profile2.schedule.repeatedly.period=4
profile2.schedule.repeatedly.unit=20
profile2.schedule.time=0
profile2.schedule.weekday=7

These are the relevant lines in the crontab file when running "crontab -e":

#Back In Time system entry, this will be edited by the gui:
*/15 * * * * /usr/bin/nice -n19 /usr/bin/ionice -c2 -n7 /usr/bin/backintime --profile-id 2 backup-job >/dev/null

I've even deleted the config file, created it from scratch and the issue persists.

To help us diagnose the problem quickly, please provide the output of the console command backintime --diagnostics.

backintime --diagnostics
Traceback (most recent call last):
  File "/usr/share/backintime/common/backintime.py", line 1190, in <module>
    startApp()
  File "/usr/share/backintime/common/backintime.py", line 507, in startApp
    args = argParse(None)
  File "/usr/share/backintime/common/backintime.py", line 568, in argParse
    args, unknownArgs = mainParser.parse_known_args(args)
  File "/usr/lib/python3.10/argparse.py", line 1871, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/usr/lib/python3.10/argparse.py", line 2080, in _parse_known_args
    start_index = consume_optional(start_index)
  File "/usr/lib/python3.10/argparse.py", line 2020, in consume_optional
    take_action(action, args, option_string)
  File "/usr/lib/python3.10/argparse.py", line 1948, in take_action
    action(self, namespace, argument_values, option_string)
  File "/usr/share/backintime/common/backintime.py", line 742, in __call__
    diagnostics = collect_diagnostics()
  File "/usr/share/backintime/common/diagnostics.py", line 177, in collect_diagnostics
    = _get_extern_versions(['encfs'], r'Build: encfs [Vv]ersion (.*)\n')
  File "/usr/share/backintime/common/diagnostics.py", line 256, in _get_extern_versions
    result = re.findall(pattern, result)[0]
IndexError: list index out of range

Additionally, please specify as precisely as you can the package or installation source where you got BackInTime: https://launchpad.net/~bit-team/+archive/ubuntu/stable (deb https://ppa.launchpadcontent.net/bit-team/stable/ubuntu jammy main)

buhtz commented 1 year ago

Thanks for reporting this.

I assume there is no problem here. This "repeatedly every 4 hours (like anacron)" is not realized via cron or anacron for real. Schedule jobs like this result in a crontab job run every 15 minutes. Then BIT itself check if the 4 days are up or not and decide itself if the job need to run or not.

Beside that crontab line do you have any other indication that your job is not run every 4 days?

About `--diagnostics`

The output of --diagnostic is another issue. Please run encfs on your shell and show us the output please.

danielaixer commented 1 year ago

Thanks for your reply.

Well... these are yesterday's backups: imagen

I was getting a frequent notification error from BIT, because some Thunderbird files are in use. That's when I opened up BIT and saw that snapshots were being taken every 15 minutes, and it kept happening while I was checking and rechecking the settings. I only have one other profile (Main profile) that is set to run every 2 days, and that one seems OK.

Now I'm wondering if this issue could be related with #988

buhtz commented 1 year ago

Can you show me the output of encfs on your shell please?

The problem might be related to #988. This is a good guess.

Please have a look into the logfiles of one of those snapshots. Look for errors and warnings please and post them here. This info is important for us to reproduce the problem.

Can you please also try to start the backup job manually? Run the GUI in debug mode (backintime-qt --debug) in a terminal. Start a job, with and/or without Thunderbird, and post the terminal output here. If it is to private (because it is a mailbox) you can extract the lines with errors and warnings or send us the terminal log via email. But it would help with diagnosis.

I'm not sure but my thesis is: The scheduling works fine and as expected. If a backup job is not successful BIT will try it again as soon as possible. In your case it is every 15 minutes. No matter that you have snapshots they are not without errors. That might be the reason: BIT retry creating snapshots until there is no error anymore. If on one else in the team as an idea about I assume it there is no quick solution for your problem.

If it is about files in use by Thunderbird. Can you wait 15 minutes without running Thunderbird and see if the next snapshot is successful?

danielaixer commented 1 year ago

encfs output:

Versión: encfs versión 1.9.5

Forma de uso: encfs [opciones] rootDir puntodemontaje [-- [FUSE Opciones de Montaje]]

Opciones comunes:
  -H            muestra opciones de montaje FUSE opcionales
  -s            deshabilita las operaciones multithread
  -f            ejecuta en el frente (no crea al daemon).
Los mensajes de error serán enviados a stderr
en vez de a syslog.
  -v, --verbose     Detallado(verbose): Muestra los mensajes de depuración(debug) de encfs
  -i, --idle=MINUTOS    Desmonta automáticamente después de un periodo de inactividad
  --anykey      No verifica que la llave correcta está siendo usada
  --forcedecode Desencripta(decode) los datos incluso si se detectan errores
            (para sistemas de ficheros que usan cabeceras de bloque MAC)
  --public      actuar como un sistema de archivos multiusuario común
            (encfs debe ejecutarse como superusuario)
  --reverse          Cifrado invertida
  --reversewrite        reverse encryption with writes enabled
  -c, --config=path     specifies config file (overrides ENV variable)
  -u, --unmount     unmounts specified mountPoint
  --extpass=program Usar un programa externo para introducir la contraseña

Ejemplo, para montar en ~/crypt con almacenamiento 'sin formato'(raw format) en ~/.crypt :

    encfs ~/.crypt ~/crypt

Para mas información, mire la página man encfs(1)

These are the errors from one of the many backups yesterday: imagen

An this is an example from about a month ago, from Ubuntu 14.04 and BIT 1.0.34. Backups where also set to run every 4 days and even with errors, snapshots were being taken at the desired frequency: imagen

In both cases profiles are set to ignore errors.

I'm doing now the other tests and trying to generate a snapshot without errors, will update ASAP.

Thanks for your assitance.

aryoda commented 1 year ago

@danielaixer Please try the command backintime check-config in the console to check the configuration and re-install the crontab entries

aryoda commented 1 year ago

@buhtz

Regarding the diagnostics stacktrace: This codes...

  File "/usr/share/backintime/common/diagnostics.py", line 177, in collect_diagnostics
    = _get_extern_versions(['encfs'], r'Build: encfs [Vv]ersion (.*)\n')
  File "/usr/share/backintime/common/diagnostics.py", line 256, in _get_extern_versions
    result = re.findall(pattern, result)[0]

cannot match the localized output of encfs --version:

Versión: encfs versión 1.9.5

Perhaps env LANG=C encfs --version (or similar) does work...

danielaixer commented 1 year ago

Sooo... it's fixed (kind of).

I got rid of the files/folders that were appearing on the snapshot errors and now BIT says there's no need for a snapshot (as in "we're already up to date"): imagen

Also, sorry for the confusion. Thunderbird wasn't actively using the path that appeared on the error log; I was using it on Ubuntu 14.04 but not anymore on this new machine with Ubuntu 22.04. Even if there was a file from Thunderbird causing an error, that file wasn't in use.

However, I'm wondering if the issue will reappear as soon as any file generates an error again on the rsync process run by BIT.

buhtz commented 1 year ago

cannot match the localized output of encfs --version:

Versión: encfs versión 1.9.5

Perhaps env LANG=C encfs --version (or similar) does work...

Yes, I'll take that.

Also, sorry for the confusion.

No problem. Thanks for reporting. From my understand the behavior you described is by design the usual and expected behavior. No matter that it is not ideal from a users perspective. So there is no bug in scheduling.

From your point of view as a user: You set it to run every 4 days. When the job fails. What would you expect (or need) from a backup software in that case? Should it automatically retry? When should it retry? Or do you want to make this behavior configurable? e.g. "Retry every 3 hours" or something like this?

Note to me: I wonder how crontab-like jobs (e.g. "each 6 hours" -> 0 */6 * * * behave when an error occurred. I assume the retry happens 6 hours later and not earlier. Another point on the todo list: When and how retrying failed snapshot jobs.

aryoda commented 1 year ago

@danielaixer THX for reporting this issue and the solution!

OK, to summarize the reason for this unexpected behavior:

If taking a snapshot produces errors (not all files backed-up) the backup restarts every 15 minutes creating a new snapshot...

Is this correct, a new snapshot is created every 15 minutes?

I think that is not what a user wants (perhaps a few retries, but not endless retries).

@buhtz Should we make a feature request for adding a maximum number of retries?

buhtz commented 1 year ago

@buhtz Should we make a feature request for adding a maximum number of retries?

Yes and no. 😄 I don't see the solution that easy. The situation is much more complex and is related to the way how BIT implements the scheduling. Some schedules are handle via crontab in a crontab way. The others "anacron-like" are not handled by anacron but simulated by BIT itself (via that 15-min crontab jobs). A meta issue is OK. It is a question of design. We need to decide which scheduling options/behaviors we do offer. I have some thoughts about it but not an opinion about a user-friendly solution. It is related to #1449 where I wrote done some of my thoughts and discoveries.

As a first shoot to prevent Issues like this here we could try to use the same behavior as crontab-backup-jobs do. A retry should be done 4 days later. Not ideal but easy to fix in a first place. I assume BIT ignores the schedule timespan when the latest job had errors.

danielaixer commented 1 year ago

BTW, thanks everyone for the help and the quick replies!

I guess BIT's retry "policy" in case of error has changed since v1.0.34. Back then there wasn't a retry policy, or I was lucky and it wasn't working for me.

Aside of @buhtz's proposal, here's my crappy two cents: we now have a checkbox to continue in case of error. What about another checkbox to "consider snapshots with errors as valid/successful"?

aryoda commented 1 year ago

we now have a checkbox to continue in case of error. What about another checkbox to "consider snapshots with errors as valid/successful"?

Do you refer to this setting ("continue on error")?

buhtz commented 1 year ago

🤣 🤣 🤣 Never realized that this checkbox exists. Thanks!

aryoda commented 1 year ago

@buhtz Unfortunately we have an end user documentation gap here for all these options:

https://backintime.readthedocs.io/en/latest/settings.html#options

I think we should keep this issue open to

improve the user documentation (the above option is a misnomer IMHO, it does not continue but simply keep instead of deleting the current snapshot in case of an error): "continue on errrors" is used only here in the "take snapshot" function: https://github.com/bit-team/backintime/blob/6ee6c84658f9754b8e0ddb52fa7b32e5695c03f9/common/snapshots.py#L1197-L1204
Then decide how to improve this (possibly via a new issue as feature request)

danielaixer commented 1 year ago

Do you refer to this setting ("continue on error")?

@aryoda Yes, that one! 🙂

aryoda commented 1 year ago

@danielaixer Could you please post the output of env LC_ALL=C encfs here (the first few lines are enough) to help us to test a fix for the diagnostics bug in case of a non English locale? THX a lot :-)

danielaixer commented 1 year ago

Is this OK?

env LC_ALL=C encfs

Build: encfs version 1.9.5

Usage: encfs [options] rootDir mountPoint [-- [FUSE Mount Options]]

Common Options:
  -H            show optional FUSE Mount Options
  -s            disable multithreaded operation
  -f            run in foreground (don't spawn daemon).
            Error messages will be sent to stderr
            instead of syslog.

aryoda commented 1 year ago

Is this OK?

Yes, that is perfect, thanks for testing!

Germar commented 1 year ago

This is the line which is responsible for the repeating snapshots:

https://github.com/bit-team/backintime/blob/6ee6c84658f9754b8e0ddb52fa7b32e5695c03f9/common/snapshots.py#L1248-L1250

If you want to don't want to retry after an error you can remove the if ... clause completely

bit-team / backintime