Cannot import root pool in 1.0.1

lnicola commented 7 years ago

Probably the same issue reported on AUR.

I have two pools, one for the root fs and one for storage. When I drop to the shell on boot, if I run zpool import -a, the storage pool gets imported fine, while for the root pool I get an error saying that the host id has changed. I can import it with -f, but it fails again on the next boot.

Also, after the force import, I can boot with the standard initcpio with no issues (that is, it doesn't complain about the host id like I was afraid it might).

I think I've had issues in the past where I had one host id in the running system and another one in initcpio. Possibly related to https://github.com/archzfs/archzfs/commit/0760a006abf9c52fbfd0ea4d07189eb54efc5f43.

dasJ commented 7 years ago

After booting, do you have any core dump in your journal?

lnicola commented 7 years ago

Yes, sorry I missed it. It's crashing with SIGSEGV somewhere around a NULL pointer: https://i.imgur.com/9XPQ0qS.jpg. Not quite sure why the faulting address is -4, while the instruction seems to look 8 bytes before rdi. Could rdi be 4?

  4172a0:       53                      push   %rbx
  4172a1:       48 83 ec 10             sub    $0x10,%rsp
  4172a5:       48 8b 05 1c ca 28 00    mov    0x28ca1c(%rip),%rax        # 0x6a3cc8
  4172ac:       48 85 c0                test   %rax,%rax
  4172af:       0f 85 8b 00 00 00       jne    0x417340
  4172b5:       48 85 ff                test   %rdi,%rdi
  4172b8:       74 7e                   je     0x417338
  4172ba:       48 8b 47 f8             mov    -0x8(%rdi),%rax # <------ HERE
  4172be:       48 8d 77 f0             lea    -0x10(%rdi),%rsi
  4172c2:       a8 02                   test   $0x2,%al
  4172c4:       75 32                   jne    0x4172f8
  4172c6:       64 48 83 3c 25 c8 ff    cmpq   $0x0,%fs:0xffffffffffffffc8
  4172cd:       ff ff 00 
  4172d0:       74 7e                   je     0x417350
  4172d2:       a8 04                   test   $0x4,%al
  4172d4:       48 8d 3d 05 a5 28 00    lea    0x28a505(%rip),%rdi        # 0x6a17e0
  4172db:       74 0c                   je     0x4172e9
  4172dd:       48 89 f0                mov    %rsi,%rax
  4172e0:       48 25 00 00 00 fc       and    $0xfffffffffc000000,%rax
  4172e6:       48 8b 38                mov    (%rax),%rdi
  4172e9:       48 83 c4 10             add    $0x10,%rsp
  4172ed:       31 d2                   xor    %edx,%edx
  4172ef:       5b                      pop    %rbx
  4172f0:       e9 bb c4 ff ff          jmpq   0x4137b0
  4172f5:       0f 1f 00                nopl   (%rax)
  4172f8:       8b 15 76 a4 28 00       mov    0x28a476(%rip),%edx        # 0x6a1774
  4172fe:       85 d2                   test   %edx,%edx
  417300:       75 26                   jne    0x417328
  417302:       48 3b 05 47 a4 28 00    cmp    0x28a447(%rip),%rax        # 0x6a1750
  417309:       76 1d                   jbe    0x417328
  41730b:       48 3d 00 00 00 02       cmp    $0x2000000,%rax
  417311:       77 15                   ja     0x417328
  417313:       48 83 e0 f8             and    $0xfffffffffffffff8,%rax
  417317:       48 89 05 32 a4 28 00    mov    %rax,0x28a432(%rip)        # 0x6a1750
  41731e:       48 01 c0                add    %rax,%rax
  417321:       48 89 05 18 a4 28 00    mov    %rax,0x28a418(%rip)        # 0x6a1740
  417328:       48 83 c4 10             add    $0x10,%rsp
  41732c:       48 89 f7                mov    %rsi,%rdi
  41732f:       5b                      pop    %rbx
  417330:       e9 bb ae ff ff          jmpq   0x4121f0
  417335:       0f 1f 00                nopl   (%rax)
  417338:       48 83 c4 10             add    $0x10,%rsp
  41733c:       5b                      pop    %rbx
  41733d:       c3                      retq

lnicola commented 7 years ago

The crashing function is free, presumably called with a NULL pointer.

lnicola commented 7 years ago

This is what I have in the kernel command line: root=zfs:bike/zroot. I could try debugging this, but I'll have to change the code a bit or cherry-pick https://github.com/dasJ/sd-zfs/commit/5013a286e8c1ea80fff322717438ee8af1da3fc4.

dasJ commented 7 years ago

Wow, I never expected that level of detail. I thing I might have a fix for that, but I need to test that further

dasJ commented 7 years ago

@lnicola Can you test 1.0.2? It's pushed to the AUR

lnicola commented 7 years ago

Yes, it's working now.

dasJ / sd-zfs

Cannot import root pool in 1.0.1 #22