Antergos / Cnchi

A modern, flexible online system installer for Antergos Linux
GNU General Public License v3.0
291 stars 101 forks source link

problem about ZFS #680

Open nabei opened 7 years ago

nabei commented 7 years ago

Problem:

*cnchi 0.14.204 when I reboot my pc after installing antergos with zfs 001 modprobe zfs failed

some tips

003

002

jjsaunier commented 7 years ago

I dont think this is related to #564, the EFI partition is now correct (fat32).

The error is : no device specified for hibernation

related to that :

https://wiki.archlinux.org/index.php/Power_management/Suspend_and_hibernate#Hibernation

We need a swap partition

karasu commented 7 years ago

@ProPheT777 you're partially right. There is that error indeed, but it's not fatal (system should boot without any problems).

The problem here is with the zfs module, that is not being build if the iso's kernel version is different from the version that is finally installed on the destination system.

I think we need to run dkms manually as pacman hook does not work well in a chrooted enviroment (as kernel versions differ). <- just guessing here, I need to confirm this.

Will look into it asap.

EDIT: Oh, and we didn't get this because when we test the iso, the same kernel version is installed, obviously.

jjsaunier commented 7 years ago

oh ok, nice to know!

I'm also impacted by this issue (for ZFS), in other hand, on installation without ZFS, I got error no device specified for hibernation and just a white screen with cursos instead of antergos os. Don't know if it's related.

I'll wait your PR to test with ZFS

nabei commented 7 years ago

@ProPheT777
"on installation without ZFS, I got error no device specified for hibernation and just a white screen with cursos instead of antergos os. " maybe, because your swap size too small to hibernation

jjsaunier commented 7 years ago

I dig into it this morning, it's because I select to pre install proprietary driver for graphic device. Without checking it, it's working well. So it's an issue related to my hardware and not to Cnchi

I have the warning about hibernation but as said @karasu, it's not fatal.

nabei commented 7 years ago

@karasu The last part of my log http://pastebin.com/8qrdFZyY

karasu commented 7 years ago

Thanks both, very useful feedback. I know exactly what's wrong thanks to you. I'll post here when I have this one solved.

2016-12-16 05:54:39 [DEBUG] install.py(1327) configure_system(): Installing zfs modules v0.6.5.8...
2016-12-16 05:54:39 [DEBUG] run_cmd.py(147) chroot_call(): Error! echo
Your kernel headers for kernel 4.8.8-2-ARCH cannot be found at
/usr/lib/modules/4.8.8-2-ARCH/build or /usr/lib/modules/4.8.8-2-ARCH/source.
2016-12-16 05:54:39 [DEBUG] run_cmd.py(147) chroot_call(): Error! echo
Your kernel headers for kernel 4.8.8-2-ARCH cannot be found at
/usr/lib/modules/4.8.8-2-ARCH/build or /usr/lib/modules/4.8.8-2-ARCH/source.
nabei commented 7 years ago

Is there any news? @karasu

karasu commented 7 years ago

Nope. To be honest I've been doing some packaging (among other things) this weekend, due to an Arch package renaming (that broke our KDE installation). I'll get back to you asap when I have something. Thanks for the remainder, though.

karasu commented 7 years ago

@nabei it works for you?

MrClayPole commented 7 years ago

Hi, I've just tested this zfs installer and get the same error as shown in the screenshot. Let me know of you would like logs?

stratus-ss commented 7 years ago

@MrClayPole @nabei, I have some spare hardware I can try to reproduce this on. Can you tell me exactly what your hardware setups are (specs) and what options you chose in the installer so I can attempt to reproduce this?

MrClayPole commented 7 years ago

Hi stratus-ss, Thanks for taking the time to look in to this for me. Below is my lspci output and the options I selected during the install. Let me know if you need any more info.

00:00.0 Host bridge [0600]: Intel Corporation Skylake Host Bridge/DRAM Registers [8086:191f] (rev 07)
00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07)
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller [8086:a12f] (rev 31)
00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-H CSME HECI #1 [8086:a13a] (rev 31)
00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] [8086:a102] (rev 31)
00:1b.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Root Port #17 [8086:a167] (rev f1)
00:1b.2 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Root Port #19 [8086:a169] (rev f1)
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #1 [8086:a110] (rev f1)
00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #9 [8086:a118] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-H LPC Controller [8086:a145] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-H PMC [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation Sunrise Point-H HD Audio [8086:a170] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-H SMBus [8086:a123] (rev 31)
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8] (rev 31)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK104 [GeForce GTX 770] [10de:1184] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GK104 HDMI Audio Controller [10de:0e0a] (rev a1)
03:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge [1b21:1080] (rev 04)
04:00.0 Multimedia audio controller [0401]: C-Media Electronics Inc CMI8788 [Oxygen HD Audio] [13f6:8788]
05:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]

For the install I selected the following options

  1. English
  2. English (GB) United Kingdom
  3. Timezone London
  4. Keyboard English (UK)
  5. Dekstop environment Gnome
  6. The following installer options: AUR, Chromium, Fonts, FireFox, Flash, Nivida driver, LibreOffice, Steam, Firewall and SMB

This result in the same errors as shown in the first post.

I've been able to get the above installation booted by carrying out the following steps before rebooting after installation completes at a terminal.

  1. sudo zpool import antergos_gbvf
  2. sudo mkdir /install
  3. sudo zfs set mountpoint=/install
  4. sudo zfs mount antergos_gbvf
  5. sudo arch-chroot /install
  6. pacman -Syu
  7. pacman -S zfs
  8. pacman -S spl
  9. mount /dev/sda1 /boot
  10. mount /dev/sda2 /boot/efi
  11. mkinitcpio -p linux
  12. umount /boot/efi
  13. umount /boot
  14. exit
  15. sudo zfs umount antrergos_gbvf
  16. sudo zfs set mountpoint=/ antergos_gbvf
  17. sudo zpool export antergos_gbvf

Hope this helps. Let me know if you need any more info.

karasu commented 7 years ago

@stratus-ss Problem always arises when the kernel version installed differs from the one in the live cd (I do not know if there are some other situations where this happens, too).

MrClayPole commented 7 years ago

@karasu Thanks for the feed back. I'm still learning the Linux boot process and how it interacts with ZFS. So would the fix be to prevent the ZFS DKMS modules from being built against the incorrect kernel would be to update the installer so it does the following before the ZFS dataset is dismounted?

  1. sudo arch-chroot /install
  2. pacman -S --noconfirm zfs spl
  3. mkinitcpio -p linux or mkinitcpio -p linux-lts
  4. exit
karasu commented 7 years ago

@MrClayPole In fact, all this is done by the installer. I think that the problem is with the dkms pacman hook, but yes, maybe I can modify Cnchi to redo the zfs spl installation and rerun mkinitcpio again after installation. I need time to do some tests.

lots0logs commented 7 years ago

@karasu The pacman hooks still don't run correctly under chroot. We'll have to call the hook manually and be sure to provide it with the right arguments so it does what's needed.

nabei commented 7 years ago

@karasu Why does this script work well ?no matter the kernel version is same with ISO's or not ? https://github.com/danboid/ALEZ

karasu commented 7 years ago

It does not use dkms... https://wiki.archlinux.org/index.php/Dynamic_Kernel_Module_Support

"Dynamic Kernel Module Support (DKMS) is a program/framework that enables generating Linux kernel modules whose sources generally reside outside the kernel source tree. The concept is to have DKMS modules automatically rebuilt when a new kernel is installed.

This means that a user does not have to wait for a company, project, or package maintainer to release a new version of the module. Since the introduction of Pacman#Hooks, the rebuild of the modules is handled automatically when a kernel is upgraded."

nabei commented 7 years ago

Can't boot after updating system again, I have to change zfs to lvm. How sad.

karasu commented 7 years ago

Before rebooting, always check that spl and zfs have been build without errors. Sorry, atm I don't know how to fix this in our end. You can try btrfs (it's what I've been using for a while without any issues).

stratus-ss commented 7 years ago

I would advise against btrfs. From what I understand it can be quite the adventure keeping your data safe.

As to the ZFS issue, I have experienced the same problem on CentOS as well as Ubuntu 14.04. There just seems to be a problem rebuilding the ZFS modules from time to time and the solution ranges from the simple to uninstalling old kernels and all the zfs bits and reinstalling.

This is not an Arch specific problem

karasu commented 7 years ago

btrfs using RAID0 or RAID1 works like a charm. RAID5 is known to have big issues, yes.

I've modified the installer so it tries to rebuild spl a zfs modules and install them in order. I tested it and worked for me (using a different kernel version in the iso than the one installed).

Once we have tested it as it needs to be tested, we'll activate the zfs option again.

difranco commented 7 years ago

For what it's worth I'll also advise against btrfs since it's been silently corrupting my data and has produced hundreds of thousands of errors detected by its own scrub which it can't repair with its own fsck which is why I'm now looking into antergos on ZFS.

With the latest cnchi I'm getting a failure to find the ZFS pool after rebooting immediately after install. I tried both not specifying a pool name and specifying one and it doesn't work in either case. Any idea where I should start to troubleshoot this? Is it a known issue?

karasu commented 7 years ago

With the latest cnchi I'm getting a failure to find the ZFS pool after rebooting immediately after install. I tried both not specifying a pool name and specifying one and it doesn't work in either case. Any idea where I should start to troubleshoot this? Is it a known issue?

Not that I know of, and I do not recall anybody reporting this issue.

  1. Are you installing using UEFI or legacy BIOS?
  2. Could you share your /tmp/cnchi.log ?
  3. Could you share your /boot/grub/grub.cfg ?
difranco commented 7 years ago

This is under UEFI. I have needed to install another OS to use the system in the meantime, so I won't be able to provide the files soon, but if I get a chance I will repeat the install and provide them.