brainupdaters / drlm

Disaster Recovery Linux Manager
http://drlm.org
GNU General Public License v3.0
69 stars 15 forks source link

Can't boot Client machine with PXE (Hyper-V VM) #44

Closed uxtuahgp closed 7 years ago

uxtuahgp commented 7 years ago

drlm-2.0.0-1git.el7.centos.noarch.rpm rear-2.00-21.git201701231729.x86_64.rpm After drlm runbackup trying to boot client machine but it hangs on Loading Linux kernel ... In /var/lib/drlm/store/vr2-testnix-02/PXE/rear-vr2-testnix-02.log

2017-01-26 12:46:40 systemd-udevd will be used - no need for udev rules rewrites 2017-01-26 12:46:40 Including build/SUSE_LINUX/610_link_systemd_lib.sh ln: failed to create symbolic link '/tmp/rear.XGq9EQT9eMHkhvE/rootfs/lib/systemd/system': No such file or directory 2017-01-26 12:46:40 Including build/GNU/Linux/610_verify_and_adjust_udev_systemd.sh

uxtuahgp commented 7 years ago

logs 20170126.zip

didacog commented 7 years ago

Hi @uxtuahgp,

Can you provide also:

Thanks!

uxtuahgp commented 7 years ago

Hi, Didac! [root@vr2-testwss-01 PXE]# drlm listclient -A

Id Name MacAddres Ip Client OS Network 1 vr2-testnix-02 00155d7c4605 192.168.201.2 DRLM201

[root@vr2-testwss-01 PXE]# more /var/lib/drlm/store/boot/cfg/*

echo "Loading Linux kernel ..." linux (tftp)/vr2-testnix-02/PXE/vr2-testnix-02.kernel rw gfxpayload=vga=normal console=tty0 console=ttyS0,115200n8 echo "Loading Linux Initrd image ..." initrd (tftp)/vr2-testnix-02/PXE/vr2-testnix-02.initrd.cgz

[root@vr2-testwss-01 PXE]# cat /etc/drlm/clients/vr2-testnix-02.cfg

This file has been generated by instclient , it can be modified at your convenience, see http://relax-and-recover.org/ for more information

CLI_NAME=vr2-testnix-02 SRV_NET_IP=192.168.201.1

OUTPUT=PXE OUTPUT_PREFIX=$OUTPUT OUTPUT_PREFIX_PXE=vr2-testnix-02/$OUTPUT OUTPUT_URL=nfs://192.168.201.1/var/lib/drlm/store/vr2-testnix-02 BACKUP=NETFS NETFS_PREFIX=BKP BACKUP_URL=nfs://192.168.201.1/var/lib/drlm/store/vr2-testnix-02

SSH_ROOT_PASSWORD=drlm

didacog commented 7 years ago

Hi @uxtuahgp,

Finally I had some time to check your logs,

This seems a ReaR bug with SLES12, or I'm wrong with the version?

/usr/share/rear/build/SUSE_LINUX/610_link_systemd_lib.sh:

# Fedora puts systemd stuff into /usr/lib/systemd and SUSE under /lib/systemd
pushd $ROOTFS_DIR >/dev/null
    if [[ -d usr/lib/systemd/system ]];  then
        if [[ ! -d lib/systemd/system ]]; then
            ln -sf $v ../../usr/lib/systemd/system $ROOTFS_DIR/lib/systemd/system >&2
        fi
    else
        Error "Missing usr/lib/systemd/system - too confused to continue"
    fi
popd >/dev/null

Please try adding to your client's /usr/share/rear/build/SUSE_LINUX/610_link_systemd_lib.sh file:

        if [[ ! -d lib/systemd/system ]]; then
                        mkdir -p $v $ROOTFS_DIR/lib/systemd >&2  ## ADD THIS LINE!
            ln -sf $v ../../usr/lib/systemd/system $ROOTFS_DIR/lib/systemd/system >&2
        fi

And run a backup from DRLM again. This may solve your problem.

@jsmeix did you found any of these errors in SLES12?

I will open an issue to ReaR when SLES version was confirmed and I guess the mkdir will solve the problem and could send a pull request:

ln: failed to create symbolic link '/tmp/rear.XGq9EQT9eMHkhvE/rootfs/lib/systemd/system': No such file or directory

Regards,

jsmeix commented 7 years ago

@didacog interestingly I have that too on my SLES12 test system but I never noticed issues because of that.

I did "rear -d -D mkrescue" and got in the log (excerpt):

+ source /root/rear/usr/share/rear/build/SUSE_LINUX/610_link_systemd_lib.sh
++ pushd /tmp/rear.UsbkVIYhpMrUQB1/rootfs
++ [[ -d usr/lib/systemd/system ]]
++ [[ ! -d lib/systemd/system ]]
++ ln -sf -v ../../usr/lib/systemd/system /tmp/rear.UsbkVIYhpMrUQB1/rootfs/lib/systemd/system
ln: failed to create symbolic link '/tmp/rear.UsbkVIYhpMrUQB1/rootfs/lib/systemd/system': No such file or directory
++ popd

I don't think the initial issue description in https://github.com/brainupdaters/drlm/issues/44#issue-203335145 "boot client machine but it hangs on Loading Linux kernel ..." is related to that "failed to create symbolic link" error because the "Loading Linux kernel" message comes from the bootloader and afterwards a second bootloader message should come which is about "Loading initrd...".

Accordingly it seems the bootloader itself hangs and not something in the booted system where systemd could be involved.

FYI: I have zero experience with PXE or BOOTP so that I cannot help if the initial issue here is related to that.

Regarding the "failed to create symbolic link" error I submitted this ReaR issue: https://github.com/rear/rear/issues/1185

didacog commented 7 years ago

Thanks @jsmeix!

I hope that @uxtuahgp can test it and update the issue https://github.com/rear/rear/issues/1185 with his feedback.

By the way, @uxtuahgp, how many assigned memory have your Hyper-V VM? Some times with VM's with no enough memory initrd images cannot be expanded and ends with fail and of course cannot boot.

uxtuahgp commented 7 years ago

Hi! It looks like Hyper-V issue. Sometimes it hangs on Welcome to GRUB, sometimes on Loading Linux Kernel... and sometimes it reboots or hangs on Loading Linux Initrd image...

I think we have to close this issue and, some day, try to test it without virtual machines...

uxtuahgp commented 7 years ago

It seems, PXE boot worked fine on my Hyper-V sandbox, when i installed el6 rpm from drlm.org site. There were some problems with delbackup and so on, but PXE boot worked fine, as far as i remember.

didacog commented 7 years ago

@uxtuahgp is strange, because the PXE build did not change in develop branch. Is the same that in v2.0 RPM from drlm.org site.

Maybe you can try increasing VM memory to 1GB if you have less? I've tested OpenSUSE Leap with VirtualBox VM and it hanged with 512MB RAM on loading kernel/initrd until I increased RAM to 1GB and it booted OK.

Maybe some issue with GRUB2 modules that are loaded in /var/lib/drlm/store/boot/grub/grub.cfg?

You can edit this file and adjust ot to your environment if needed, it works with all our test scenarios, but never tested on Hyper-V.

Also you can try to adjust kernel options at boot time in /var/lib/drlm/store/boot/cfg/XX:XX:XX:XX:XX:XX :

  echo "Loading Linux kernel ..."
  linux (tftp)/rear-debian/PXE/rear-debian.kernel rw gfxpayload=vga=normal console=tty0 console=ttyS0,115200n8
  echo "Loading Linux Initrd image ..."
  initrd (tftp)/rear-debian/PXE/rear-debian.initrd.cgz

Maybe there is something wrong with video modes in Hyper-V, try to remove gfxpayload=vga=normal console=tty0 console=ttyS0,115200n8 and try booting again.

uxtuahgp commented 7 years ago

it doesn't help. BTW, after boot trial i can't perform runbackup It writes ERROR: drlm:runbackup:genimage:MAKE(raw):DR:vr2-testnix-02.1.20170202094730.dr: Problem creating DR image file (raw)! aborting ... brcause it can't unmount kernel image Reboot solves this problem.

uxtuahgp commented 7 years ago

We use Hyper-V VM Gen 1. May be it's UEFI/EFI issue? What if we can't use UEFI? How could we affect on ISO image type?

didacog commented 7 years ago

Hi @uxtuahgp

Are you using UEFI in VM's? good to know it, I've never tested any UEFI VM :-P

Anyhow UEFI should work also. DRLM do not use ISO images just kernel & initrd to provide a rescue image over the network.

The problem creating DR image file, is strange, because is creating the DR image on /var/lib/drlm/arch without mounting/unmounting anything yet. Maybe you were running out of space on /var?

Can you provide the logs of that runbackup that was failed?

As i know that you cannot build drlm from sources on your sandbox, I've sent you the RPM of the latest develop build, we solved few small things that could result in some small issues.

Regards, Didac

uxtuahgp commented 7 years ago

Sorry, it was an echo from my client manipulations. No problems with image creation, but each time after boot trial i uged to reboot DRLM host, cause drlm claims about Problem disabling NFS export

` ++ report_error 'drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS export! aborting ...' ++ local 'ERRMSG=drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS export! aborting ...' ++ error_reporting ++ '[' no == yes ']' ++ return 1 ++ Error 'drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS export! aborting ...' ++ '[' drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS 'export!' aborting ... -eq drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS 'export!' aborting ... ']' ++ EXIT_CODE=1 ++ VERBOSE=1 ++ LogPrint 'ERROR: drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS export! aborting ...' ++ Log 'ERROR: drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS export! aborting ...' ++ test 1 -gt 0 +++ Stamp +++ date '+%Y-%m-%d %H:%M:%S ' ++ echo '2017-02-02 10:08:25 ERROR: drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS export! aborting ...' 2017-02-02 10:08:25 ERROR: drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS export! aborting ... ++ Print 'ERROR: drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS export! aborting ...' ++ test 1 ++ echo -e 'ERROR: drlm:runbackup:NFS:DISABLE:vr2-testnix-02: Problem disabling NFS export! aborting ...'

`

didacog commented 7 years ago

Ok, now this should work with the latest code in develop branch.

If you can try the latest RPM you should not have those issues anymore.

uxtuahgp commented 7 years ago

I just suspect UEFI issue.

didacog commented 7 years ago

Could be, I never tested UEFI on VMs, on Physical servers works well, but on VMs ..., I don't know...

uxtuahgp commented 7 years ago

It's Hyper-V Gen 1 VM. Don't know if it uses UEFI or not.

didacog commented 7 years ago

Hi @uxtuahgp

Any news on your testing?

uxtuahgp commented 7 years ago

I've postponed these tests for a good time. Anyway it has no sense for virtual machines.

didacog commented 7 years ago

@uxtuahgp

IMHO It has lot of sense for VMs and/or Physical servers, because you are taking an OS image not dependent on the Virtualization platform, and is very easy to do V2V, V2P, P2P and P2V migrations without problems that other platform dependent tools have, with no needs on disk image format conversions,... and also using same tool for VM and Physical server backups.

I guess you are saying this because you use VM snapshots as backup solution, but snapshots are not backups and have some drawbacks. Maybe I will not convince you about this, but using snapshots is not the best use case as a backup solution. :P

Regards,

didacog commented 7 years ago

As there is no news about this issue, I will close it. Can be re-opened if needed in the future.