Closed janhoelscher closed 1 year ago
Hello @janhoelscher,
Can you try to run it again in degbug mode: drlm -vD runbackup
... and send the log?
Something is wrong as seems is not getting/using proper kernel & initrd files.
Please also try this client config:
DRLM_BKP_TYPE=PXE
DRLM_BKP_PROT=NETFS
DRLM_BKP_PROG=TAR
The tests with ppc64le were done with this settings so, if works and with the logs from the actual config we may find what's not working properly with your configuration.
Thanks, Didac
Hey @didacog, issue_test.log client_PXE.log
I have created two logs.
client_PXE
with sample config, this stops at the exact same stage as beforeissue_test
with config provided by you Listbackup looks like:
Backup Id Client Name Backup Date Status Duration Size PXE Configuration Type
100.20220316114003 <censored> 2022-03-16 11:40 disabled 0h.0m.58s 190M client_PXE PXE-RSYNC
100.20220317075819 <censored> 2022-03-17 07:58 enabled 0h.1m.13s 262M * issue_test PXE-NETFS
With issue_test
i am able to load rear recovery environment via grub sucessfully.
But when i'm trying to run rear recover the following happen:
RESCUE <censored>:~ # rear recover
Relax-and-Recover 2.6 / 2020-06-17
Running rear recover (PID 1389)
Using log file: /var/log/rear/rear-<censored>.log
DRLM_MANAGED: Loading configuration from DRLM ...
DRLM_MANAGED: Sending Logfile: '/var/log/rear/rear-<censored>.log' to DRLM in real time ...
Running workflow recover within the ReaR rescue/recovery system
ERROR: Archive not found on [<censored>@<censored-ip>.123:<censored>_default/]
Some latest log messages since the last called script 550_check_remote_backup_archive.sh:
2022-03-17 08:17:22.370549298 Including verify/RSYNC/default/550_check_remote_backup_archive.sh
Aborting due to an error, check /var/log/rear/rear-<censored>.log for details
Exiting rear recover (PID 1389) and its descendant processes ...
Terminating descendant process 1548 tail -f --lines=5000 --pid=1389 /var/log
Running exit tasks
Terminated
Hi @janhoelscher
You should specify the config to the rear command:
try this:
rear -v recover -C issue_test
Regards, Didac
Is suspicious that the backup size is so small.
Are you using SAN disks on the LPAR?
in that case you must comment out this also in the config files, as specified in the default config file:
# ================================
# ======== Boot Over SAN =========
# ================================
# Use this setup if your client boot disks are not internal but in a SAN/Disk Cabinet.
AUTOEXCLUDE_MULTIPATH=n
BOOT_OVER_SAN=y
MODULES=( ${MODULES[@]} dm-multipath )
MODULES_LOAD=( ${MODULES_LOAD[@]} dm-multipath )
To avoid skipping all filesystems from all the SAN disks.
regards, Didac
Hey @didacog,
i can confirm that i have successfully backup and restore a machine. The config provided by you works.
Thanks for your help!
Reg. Jan
Hey @janhoelscher,
Perfect!
Seems that the problem was the SAN/MPIO options, wasn't it?
Can you try the RSYNC backup with the SAN/MPIO options? This will help us to confirm that rsync is working properly also and will give you the chance of incremental backups, so faster backup times, point in time recovery, ...
Just comment out DRLM_BACKUP_PROT/PROG like this:
DRLM_BKP_TYPE=PXE
#DRLM_BKP_PROT=NETFS
#DRLM_BKP_PROG=TAR
AUTOEXCLUDE_MULTIPATH=n
BOOT_OVER_SAN=y
MODULES=( ${MODULES[@]} dm-multipath )
MODULES_LOAD=( ${MODULES_LOAD[@]} dm-multipath )
Thanks is advance, Didac
Hey @didacog,
yes i think last problem was because of SAN Disks.
I have tried your config and it hangs on "Enabling PXE Boot" Log Files are uploaded. debug.log
I think the error from begin is still there.
Reg. Jan
Little side info: If i am running backup as a sched task, it finish without error. But booting is not possible, due to wrong files in grub.cfg. See above.
Backup size seams good with 2,9GB, so this problem is regarding pxe boot cfg generation.
Hello @janhoelscher,
looking at the logs something very strange happens when defines CLI_KERNEL_FILE and CLI_INITRD_FILE that makes no sense.. Can you try the following?
drlm bkpmgr -d -I <backup_id>
drlm -vD runbackup -c <client_name> -C <config_name>
if it hangs on "Enabling PXE Boot", open a new ssh session and list the contents of /var/lib/drlm/store/Hopefully this helps us to get an idea why is doing this strange thing with this config, as we tested the config ourselves and worked, not in a ppc64le system, but there is nothing from the arch perspective that should change this particular post backup phase...
Thanks in advance, Didac
Hey @didacog,
[root@<censored-drlmhost> default]# ls -slaih
insgesamt 130M
2 4,0K drwxr-x---. 6 root root 4,0K 24. Mär 08:10 .
396 0 drwxr-xr-x. 4 root root 39 22. Mär 16:37 ..
128769 4,0K dr-xr-xr-x. 20 root root 4,0K 24. Mär 08:10 backup
148899 4,0K drwxr-xr-x. 3 root root 4,0K 24. Mär 08:10 backup-20220324.0810.log.gz
11 16K drwx------. 2 root root 16K 24. Mär 08:06 lost+found
14 98M -r--------. 1 root root 98M 24. Mär 08:07 <censored-host>.initrd.cgz
15 28M -r--------. 1 root root 30M 24. Mär 08:07 <censored-host>.kernel
16 4,0K -r--------. 1 root root 273 24. Mär 08:07 <censored-host>.message
12 4,0K -rw-------. 1 root root 516 24. Mär 08:07 README
148904 4,0K drwxr-xr-x. 3 root root 4,0K 24. Mär 08:10 rear-20220324.0810.log
18 3,5M -rw-------. 1 root root 3,5M 24. Mär 08:07 rear.log
17 4,0K -rw-------. 1 root root 538 24. Mär 08:07 rear-<censored-host>
13 4,0K -rw-------. 1 root root 273 24. Mär 08:07 VERSION
[root@<censored-drlmhost> default]# pwd
/var/lib/drlm/store/<censored-host>/default
echo "Loading Linux kernel ..."
linux (tftp)/<censored-host>/default/PXE/anaconda-ks.cfg
original-ks.cfg
script
echo "Loading Linux Initrd image ..."
initrd (tftp)/<censored-host>/default/PXE/anaconda-ks.cfg
original-ks.cfg
script
[drlm-censored_drlmhost-runbackup.20220324.080626.1604.log](https://github.com/brainupdaters/drlm/files/8339314/drlm-censored_drlmhost-runbackup.20220324.080626.1604.log)
[drlm.log](https://github.com/brainupdaters/drlm/files/8339322/drlm.log)
(Github does let me upload rear client log file for the moment :-( , maybe this logs helps so far)
Reg.
Jan
Well... seems the files are in the wrong place.
can you show me the output of rear -vD dump -C <config_name>
fom the client and the logfile? seems that is not getting the OUTPUT vars properly from the API.
Thanks in advance! Didac
Output from rear dump:
# System definition:
declare -- ARCH="Linux-ppc64le"
declare -- OS="GNU/Linux"
declare -- OS_MASTER_VENDOR="Fedora"
declare -- OS_MASTER_VERSION="8"
declare -- OS_MASTER_VENDOR_ARCH="Fedora/ppc64le"
declare -- OS_MASTER_VENDOR_VERSION="Fedora/8"
declare -- OS_MASTER_VENDOR_VERSION_ARCH="Fedora/8/ppc64le"
declare -- OS_VENDOR="RedHatEnterpriseServer"
declare -- OS_VERSION="8"
declare -- OS_VENDOR_ARCH="RedHatEnterpriseServer/ppc64le"
declare -- OS_VENDOR_VERSION="RedHatEnterpriseServer/8"
declare -- OS_VENDOR_VERSION_ARCH="RedHatEnterpriseServer/8/ppc64le"
# Backup with RSYNC:
declare -x RSYNC_PASSWORD="oisN6ZbFMyzmNUaXhbluer6oFN3QeR"
declare -- RSYNC_PREFIX=""
declare -- RSYNC_PROTOCOL_VERSION=""
declare -- BACKUP_DUPLICITY_NAME="rear-backup"
declare -- BACKUP_INTEGRITY_CHECK=""
declare -- BACKUP_MOUNTCMD=""
declare -- BACKUP_ONLY_EXCLUDE="no"
declare -- BACKUP_ONLY_INCLUDE="no"
declare -- BACKUP_OPTIONS=""
declare -r BACKUP_RESTORE_MOVE_AWAY_DIRECTORY="/var/lib/rear/moved_away_after_backup_restore/"
declare -a BACKUP_RESTORE_MOVE_AWAY_FILES=([0]="/boot/grub/grubenv" [1]="/boot/grub2/grubenv")
declare -a BACKUP_RSYNC_OPTIONS=([0]="--sparse" [1]="--archive" [2]="--hard-links" [3]="--numeric-ids" [4]="--stats" [5]="--devices" [6]="--acls" [7]="--xattrs")
declare -- BACKUP_SELINUX_DISABLE="1"
declare -- BACKUP_TYPE=""
declare -- BACKUP_UMOUNTCMD=""
declare -- BACKUP_URL="rsync://<censored-client>@<censored-ipsubnet>.123::/<censored-client>_default"
# Output to PXE:
declare -- PXE_CONFIG_GRUB_STYLE=""
declare -- PXE_CONFIG_PATH="/var/lib/rear/output"
declare -- PXE_CONFIG_PREFIX="rear-"
declare -- PXE_CONFIG_URL=""
declare -- PXE_CREATE_LINKS="MAC"
declare -- PXE_RECOVER_MODE=""
declare -- PXE_REMOVE_OLD_LINKS=""
declare -- PXE_TFTP_IP=""
declare -- PXE_TFTP_PATH="/var/lib/rear/output"
declare -- PXE_TFTP_PREFIX="<censored-client>."
declare -- PXE_TFTP_URL=""
declare -- OUTPUT_EFISTUB_SYSTEMD_BOOTLOADER="/usr/lib/systemd/boot/efi/systemd-bootx64.efi"
declare -- OUTPUT_LFTP_OPTIONS=""
declare -- OUTPUT_MOUNTCMD=""
declare -- OUTPUT_OPTIONS=""
declare -- OUTPUT_PREFIX="<censored-client>"
declare -- OUTPUT_PREFIX_PXE="<censored-client>/default/PXE"
declare -- OUTPUT_UMOUNTCMD=""
declare -- OUTPUT_URL="rsync://<censored-client>@<censored-ipsubnet>.123/<censored-client>_default/PXE/"
Which logfile in specific do you want? This one from last Backup?
Reg. Jan
Hello @janhoelscher,
yes the log of the last backup from the client side. By the dump output seems is getting the config properly but at the end is not storing the kernel/initrd files in the right place.
regards, Didac
Hey @didacog,
i have uploaded the file. I must zip it, otherwise Github wont allow me to upload.
Kind Reg. Jan
Hey @janhoelscher
Can you check if you have the the script /usr/share/rear/output/PXE/default/820_copy_to_net.sh
in the client and show the content? seems that is not sourcing this script.
Also do the following:
/etc/rear/issue_test.cfg
with the following content:export RSYNC_PASSWORD=***************
OUTPUT=PXE
OUTPUT_PREFIX_PXE=<censored-host>/default/PXE
OUTPUT_URL=rsync://<censored-host>@<censored-subnet>.123/<censored-host>_default/PXE/
BACKUP=RSYNC
RSYNC_PREFIX=
BACKUP_URL=rsync://<censored-host>@<censored-subnet>.123::/<censored-host>_default
BACKUP_RSYNC_OPTIONS+=(--devices --acls --xattrs)
SSH_ROOT_PASSWORD=drlm
DRLM_BKP_TYPE=PXE
AUTOEXCLUDE_MULTIPATH=n
BOOT_OVER_SAN=y
MODULES=(${MODULES[@]} dm-multipath)
MODULES_LOAD=(${MODULES_LOAD[@]} dm-multipath)
rear -vs mkbackup -C issue_test
and upload the ouptut./etc/rear/issue_test.cfg
filekind regards, Didac
Hey @didacog,
you are right /usr/share/rear/output/PXE/default/820_copy_to_net.sh
is not there.
Output from this directory:
[root@<censored-drlmhost> cfg]# cd /usr/share/rear/output/PXE/default/
[root@<censored-drlmhost> default]# ls
800_copy_to_tftp.sh 810_create_pxelinux_cfg.sh
I don't know what we do now...
Reg. Jan
Hey @janhoelscher
no worries, download the script from here: https://raw.githubusercontent.com/rear/rear/rear-2.6/usr/share/rear/output/PXE/default/820_copy_to_net.sh, put in place ant test the backup again please.
let me know if after this change works please.
I've opened an issue to rear to see why this file is now missing from their master code and is causing this issue.
regards, Didac
Hi, I checked rear-censoredhost.log.
It has this:
2022-03-28 08:26:39.624312668 Copying files '/var/lib/rear/output/<censored-host>.kernel /var/lib/rear/output/<censored-host>.initrd.cgz /var/lib/rear/output/<censored-host>.message /var/lib/rear/output/rear-<censored-host>' to rsync://<censored-host>@<censored-subnet>.123/<censored-host>_default/PXE/ location
++ cp -v /var/lib/rear/output/<censored-host>.kernel /var/lib/rear/output/<censored-host>.initrd.cgz /var/lib/rear/output/<censored-host>.message /var/lib/rear/output/rear-<censored-host> /tmp/rear.edPhWcoyMoFm3zO/tmp/rsync//
'/var/lib/rear/output/<censored-host>.kernel' -> '/tmp/rear.edPhWcoyMoFm3zO/tmp/rsync/<censored-host>.kernel'
'/var/lib/rear/output/<censored-host>.initrd.cgz' -> '/tmp/rear.edPhWcoyMoFm3zO/tmp/rsync/<censored-host>.initrd.cgz'
'/var/lib/rear/output/<censored-host>.message' -> '/tmp/rear.edPhWcoyMoFm3zO/tmp/rsync/<censored-host>.message'
'/var/lib/rear/output/rear-<censored-host>' -> '/tmp/rear.edPhWcoyMoFm3zO/tmp/rsync/rear-<censored-host>'
++ echo '
Relax-and-Recover 2.6 / 2020-06-17
Relax-and-Recover comes with ABSOLUTELY NO WARRANTY; for details see
the GNU General Public License at: http://www.gnu.org/licenses/gpl.html
Host <censored-host> using Backup RSYNC and Output PXE
Build date: Mon, 28 Mar 2022 08:25:42 +0200
'
+++ get_template RESULT_usage_PXE.txt
+++ [[ -e /etc/rear/templates/RESULT_usage_PXE.txt ]]
+++ echo /usr/share/rear/conf/templates/RESULT_usage_PXE.txt
++ cp -v /usr/share/rear/conf/templates/RESULT_usage_PXE.txt /tmp/rear.edPhWcoyMoFm3zO/tmp/rsync//README
'/usr/share/rear/conf/templates/RESULT_usage_PXE.txt' -> '/tmp/rear.edPhWcoyMoFm3zO/tmp/rsync//README'
++ cat /var/log/rear/rear-<censored-host>.log
++ case $RSYNC_PROTO in
++ Log 'rsync -a /tmp/rear.edPhWcoyMoFm3zO/tmp/rsync// --sparse --archive --hard-links --numeric-ids --stats --devices --acls --xattrs rsync://<censored-host>@<censored-subnet>.123:873/<censored-host>_default//'
++ echo '2022-03-28 08:26:39.669531306 rsync -a /tmp/rear.edPhWcoyMoFm3zO/tmp/rsync// --sparse --archive --hard-links --numeric-ids --stats --devices --acls --xattrs rsync://<censored-host>@<censored-subnet>.123:873/<censored-host>_default//'
2022-03-28 08:26:39.669531306 rsync -a /tmp/rear.edPhWcoyMoFm3zO/tmp/rsync// --sparse --archive --hard-links --numeric-ids --stats --devices --acls --xattrs rsync://<censored-host>@<censored-subnet>.123:873/<censored-host>_default//
++ rsync -a /tmp/rear.edPhWcoyMoFm3zO/tmp/rsync// --sparse --archive --hard-links --numeric-ids --stats --devices --acls --xattrs rsync://<censored-host>@<censored-subnet>.123:873/<censored-host>_default//
DRLM server 2.4.1
Number of files: 9 (reg: 7, dir: 2)
Number of created files: 7 (reg: 7)
Number of deleted files: 0
Number of regular files transferred: 7
Total file size: 137,041,987 bytes
Total transferred file size: 137,041,987 bytes
Literal data: 137,041,987 bytes
Matched data: 0 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 137,076,084
Total bytes received: 165
sent 137,076,084 bytes received 165 bytes 54,830,499.60 bytes/sec
total size is 137,041,987 speedup is 1.00
So the files <censored-host>.initrd.cgz
and <censored-host>.kernel
were copied to rsync://<censored-host>@<censored-subnet>.123:873/<censored-host>_default//
(by output/RSYNC/default/900_copy_result_files.sh
). Now I don't understand why they are missing.
Is skipping OUTPUT_PREFIX location, so is not placing the files in the proper destination.
Hi @pcahyna, The problem is that 820_copy_to_net.sh and 900_copy_result_files.sh copy the files in a diferent destination URL
OUTPUT_URL=rsync://<censored-host>@<censored-subnet>.123/<censored-host>_default/PXE/
RSYNC_PREFIX=
rsync -a $v "$result_file" "$OUTPUT_URL"
$BACKUP_PROG -a "${TMP_DIR}/rsync/${RSYNC_PREFIX}/" ${BACKUP_RSYNC_OPTIONS[@]} "${RSYNC_PROTO}://${RSYNC_USER}@${RSYNC_HOST}:${RSYNC_PORT}/${RSYNC_PATH}/${RSYNC_PREFIX}/" 2>/dev/null
thanks for the explanation. Does it mean that if the file is put back in place, the output gets copied twice, to two different folders at the rsync destination?
It seems that the underlying problem is that https://github.com/rear/rear/blob/14a9a61d62b02a27f55495320f2d0e1090a72c30/usr/share/rear/prep/RSYNC/default/100_check_rsync.sh sets the various RSYNC_
variables according to BACKUP_URL
, but the results are used also in the output stage, so if OUTPUT_URL != BACKUP_URL
(which is your case), the output files get uploaded to BACKUP_URL
instead of OUTPUT_URL
.
That's quite a confusion and I think that my removal of that script is not the only place that exposes the problem. See also usr/share/rear/output/RSYNC/default/900_copy_result_files.sh
, I think it has been also placing output files to BACKUP_URL
.
thanks for the explanation. Does it mean that if the file is put back in place, the output gets copied twice, to two different folders at the rsync destination?
Yes is being copied twice.
That's quite a confusion and I think that my removal of that script is not the only place that exposes the problem. See also
usr/share/rear/output/RSYNC/default/900_copy_result_files.sh
, I think it has been also placing output files toBACKUP_URL
.
yes, is quite strange
Hello @janhoelscher,
Did you test it with the script in place?
Can you show the output of the command: dnf info rear
from the client?
Sorry for closing the issue BTW, was my mistake yesterday clicking the wrong button.
Thanks! Didac
Hey @didacog,
i can confirm backup and restore with this file on the client.
This ist the output from dnf:
Name : rear
Version : 2.6
Release : 3.el8
Architecture : ppc64le
Size : 2.5 M
Quelle : rear-2.6-3.el8.src.rpm
Repository : @System
Aus Paketque : rhel-8-for-ppc64le-appstream-rpms
Summary : Relax-and-Recover is a Linux disaster recovery and system migration tool
URL : http://relax-and-recover.org/
Lizenz : GPLv3
Reg. Jan
Hello @janhoelscher,
This rear RPM is from RHEL repos? The rear 2.6 package should have the missing script, was a change on the master code that removed the script, but version 2.6 should have had it. So it is a bit strange and the reason we had different results on testing.
We are going to keep the issue open while working for a solution for the https://github.com/rear/rear/issues/2781 to have a proper fix.
Will be interesting to identify the source of your rear installed package to identify why a rear 2.6 release rpm is missing a script that must be there.
Regards, Didac
but version 2.6 should have had it. So it is a bit strange and the reason we had different results on testing.
Will be interesting to identify the source of your rear installed package to identify why a rear 2.6 release rpm is missing a script that must be there.
The RHEL 8 version had the script removed recently as a part of a fix for a different problem.
Thanks for the clarification @pcahyna.
So fixed other problem and caused this one, from RHEL or the rear packages in OBS for ppc64le are affected? Because the OBS packages for x86_64 aren't affected as we tested them and have the script in place.
regards, Didac
RHEL is affected on all supported architectures, not sure about OBS (I have never used it), but I suppose it uses vanilla code from Git, without RHEL patches and backports, so if you refer to 2.6 build and not to any master / development snapshots, it should be OK.
Yes are the rear upstream generated packages in Opensuse Build Service. Thanks @pcahyna for clarifying this.
@janhoelscher can you install your DRLM clients using supported DRLM provided ReaR upstream URL's instead of using RHEL repos? This should solve the issue for now until solved in the next rear release.
Thanks, Didac
This issue is solved in the coming soon rear 2.7 ( see https://github.com/rear/rear/issues/2781 & https://github.com/rear/rear/pull/2831) and will be solved for RHEL8 rear2.6 repo package restoring back /usr/share/rear/output/PXE/default/820_copy_to_net.sh
.
Issue details:
DRLM Server information:
DRLM version: Disaster Recovery Linux Manager 2.4.1 / Git
OS version: RHEL 8.5 on ppc64le
DRLM configuration files output:
DRLM_BKP_TYPE=PXE
Output of:
DRLM Client information:
ppc64le
Details:
There a two strange issues.
Backup does not complete in cli mode
When i'm taking a manuel backup over cli via
drlm runbackup -c <censored>-C client_PXE
the command stuck. See output below for explaination. The command hangs on stepEnabling PXE boot
.These are the last lines from drlm.log (i have added
set -x
to/usr/share/drlm/backup/run/default/129_post_backup_tasks.sh
to get more information):Network boot fails
If i'm taking backup via sched, the command finishes and backup is registered to db. But whem i'm trying to netboot, grub menu shows up but starting rear is not possible. Error message see below. It looks like there is something wrong with path to kernel files (original-ks.cfg, script are files unter /root/)
Error from starting REAR from grub menu:
Netboot file (aa:bb:cc:dd:ee:ff under /var/lib/drlm/store/boot/cfg):