gmzang / maczfs

Automatically exported from code.google.com/p/maczfs
Other
0 stars 0 forks source link

creating new symlinks in ZFS volume crashes OSX 10.9 Mavericks #120

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?

1. CD into a ZFS filesystem, then run:

mkdir foo && ln -s foo foobar

What is the expected output? What do you see instead?

Expected: "foobar" should point at "foo"

Actual: hard system crash & reboot, "foobar" never created

What version of the product are you using? On what operating system?

MacZFS: Reproduces both under 74.1.0 and 74.3.0 (hoping to fix the issue under 
74.1.0, I upgraded to 74.3.0 using stanback's Mavericks-compatible installer 
from issue #119)

OS X: 10.9 "Mavericks", freshly upgraded from 10.8 "Mountain Lion".

I've been running MacZFS since 2010 on Snow Leopard (then Lion, then Mountain 
Lion) without any problems at all.  This is my first issue.

Please provide any additional information below.

In case it was somehow related to filesystem issues, I created a new zfs 
filesystem (on the same pool) and the problem still recurred within the 
filesystem.

zpool scrub reports no errors.

Original issue reported on code.google.com by j...@hart.fm on 28 Oct 2013 at 10:24

GoogleCodeExporter commented 8 years ago
I know this report is a bit old, but do you have a panic report you can attach? 
I don't have Maverick and therefore can't test this on my machine.

Original comment by googlelogin@bjoern-kahl.de on 22 Nov 2013 at 9:43

GoogleCodeExporter commented 8 years ago
Sure, should have done that from the start.

Here are two - one triggered by ln, and one by rsync when copying a symlink 
(which is how the bug first hit me).

lmk if you need anything else.

thanks!

Original comment by j...@hart.fm on 22 Nov 2013 at 11:31

Attachments:

GoogleCodeExporter commented 8 years ago
I'm seeing the same issue: kernel panic triggered while creating a symlink on a 
ZFS dataset. Some diagnostic info from my system is included below.

Thanks,
Ritesh 

[ritesh@chandra : ~]$ strings /usr/sbin/zfs | grep -e VERSION
279:@(#)PROGRAM:zfs  PROJECT:maczfs_74-3-2-0-ge6eed38  VERSION:74.3.2  
BUILT:2013-11-30_23.25_+0100

[ritesh@chandra : ~]$ pkgutil --pkgs | grep -e zfs
1607:org.maczfs.zfs.106.pkg

[ritesh@chandra : ~]$ sudo less /var/log/system.log
Dec 22 17:07:40 chandra.localdomain com.apple.kextd[12]: WARNING - Invalid 
signature -67062 0xFFFFFFFFFFFEFA0A for kext "/System/Library/Extensio
ns/zfs.kext"
Dec 22 17:07:40 chandra kernel[0]: ZFS kernel module loaded
Dec 22 17:07:40 chandra kernel[0]: - - -
Dec 22 17:07:40 chandra kernel[0]: This is MacZFS 74.3.2
Dec 22 17:07:40 chandra kernel[0]: - - -
Dec 22 17:07:40 chandra kernel[0]: zfs_context_init: 
footprint.maximum=1073741824, footprint.target=102711296
Dec 22 17:07:40 chandra kernel[0]: kobj_open_file: "/etc/zfs/zpool.cache", err 
0 from vnode_open
Dec 22 17:07:40 chandra kernel[0]: zfs_module_start: memory footprint 9633280 
(kalloc 9633280, kernel 0)
Dec 22 17:07:41 chandra.localdomain mds[51]: (Normal) Volume: 
volume:0x7fc161876000 ********** Bootstrapped Creating a default store:0 
SpotLoc:(n
ull) SpotVerLoc:(null) occlude:0 /zfsPool/ArrayComm
[ritesh@chandra : ~]$ sudo less /var/log/system.log
Dec 22 17:07:40 chandra.localdomain com.apple.kextd[12]: WARNING - Invalid 
signature -67062 0xFFFFFFFFFFFEFA0A for kext "/System/Library/Extensio
ns/zfs.kext"
Dec 22 17:07:40 chandra kernel[0]: ZFS kernel module loaded
Dec 22 17:07:40 chandra kernel[0]: - - -
Dec 22 17:07:40 chandra kernel[0]: This is MacZFS 74.3.2
Dec 22 17:07:40 chandra kernel[0]: - - -
Dec 22 17:07:40 chandra kernel[0]: zfs_context_init: 
footprint.maximum=1073741824, footprint.target=102711296
Dec 22 17:07:40 chandra kernel[0]: kobj_open_file: "/etc/zfs/zpool.cache", err 
0 from vnode_open
Dec 22 17:07:40 chandra kernel[0]: zfs_module_start: memory footprint 9633280 
(kalloc 9633280, kernel 0)
Dec 22 17:07:41 chandra.localdomain mds[51]: (Normal) Volume: 
volume:0x7fc161876000 ********** Bootstrapped Creating a default store:0 
SpotLoc:(n
ull) SpotVerLoc:(null) occlude:0 /zfsPool/ArrayComm
Dec 22 17:07:42 chandra kernel[0]: hfs: mounted Recovery HD on device disk0s3
Dec 22 17:07:42 chandra.localdomain mds[51]: (Normal) Volume: 
volume:0x7fc1610c1000 ********** Bootstrapped Creating a default store:0 
SpotLoc:(n
ull) SpotVerLoc:(null) occlude:0 /Volumes/Recovery HD
Dec 22 17:07:42 chandra.localdomain fseventsd[61]: check_vol_last_mod_time:XXX 
failed to get mount time (25; &mount_time == 0x1031843d8)
Dec 22 17:07:42 chandra.localdomain fseventsd[61]: log dir: /zfsPool/.fseventsd 
getting new uuid: 6D092799-5025-4EEB-9CC4-B5DD2384505F

kernel panic log is attached.

Thanks,
Ritesh

Original comment by riteshs...@gmail.com on 23 Dec 2013 at 1:24

Attachments:

GoogleCodeExporter commented 8 years ago
I have followed the instructions at 
https://github.com/zfs-osx/spl/wiki/kernel_debug and tried to do some remote 
kernel debugging while the target machine is panicked after attempting a ln -s. 
Please see the attached lldb session file, which can hopefully assist the 
developers in debugging. 

Please note that I had to modify the source a little to get it to build with 
Xcode 5.0.2 on OS X 10.9.1. A git patch file generated against maczfs_74-3-2 is 
also attached 

Original comment by riteshs...@gmail.com on 27 Dec 2013 at 7:59

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks for the follow up and the two attachments.

Kernel backtrace:

The panic apparently happens due to calling vnode_mount() with a Null pointer. 
However, the really interesting question is, why the "symlink" function in 
"vfs_syscalls.c" branches into mount.  I'll have to check the kernel sources to 
investigate this further.

Patch:

Thanks for the work, but unfortunately the XCode build is outdated and 
unsupported for quite some time.  I should have removed the XCode files some 
years(!) ago instead of keeping them around for historic reasons.  I am 
actually surprised you managed to build it at all.

Sorry for the hours spent, I'll remove the XCode stuff in the next release.
The problem with XCode is, it does not allow (easily) the level of control I'd 
like to have over the build process, and it is absolutely version-control 
unfriendly (read: always produced conflicts in the past).

To build anything more recent than 72.x.y, use the GNU Make based build system. 
 I'll try to add a Wiki page with the details in the next few days.
Short version: Rename Makefile-host.sample to Makefile-host, adjust a few paths 
(or use as-is, should work at least on 10.5.x - 10.8.y), then say "make -f 
Makefile-maczfs install" which will build everything and "install" into a 
temporary folder that defaults to "/...worktree.../instbase".

Original comment by googlelogin@bjoern-kahl.de on 27 Dec 2013 at 11:39

GoogleCodeExporter commented 8 years ago
Thanks for the response. I'm glad you have a lead!

I have built the Makefile based build, debugged, and got the same kernel 
backtrace as in my previous report. No surprises there I guess. Actually 
putting in the time for the Xcode build was helpful since the zfs source is 
written for an older older API, and things have changed since. For instance, 
the following does not exist anymore 
/System/Library/Frameworks/Kernel.framework/Headers   
and has been replaced by
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/S
DKs/MacOSX10.9.sdk/System/Library/Frameworks/Kernel.framework/Headers (phew!)
I had to comment out the definition of dprintf in 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/S
DKs/MacOSX10.9.sdk/usr/include/stdio.h
since the zfs source defines its own dprintf; things like that. 

Thanks again!
Ritesh

Original comment by riteshs...@gmail.com on 28 Dec 2013 at 10:40

GoogleCodeExporter commented 8 years ago
I can confirm exactly the same issue, two Lacie drives connected with 
Thunderbolt using Mavericks 10.9.1 on two brand new zpools. Creating a symlink 
leads to an instant crash.

Original comment by m...@jackhale.co.uk on 9 Jan 2014 at 2:47

GoogleCodeExporter commented 8 years ago
Thanks for the update.  Unfortunately it is still unclear what the root cause 
for the instability is. 
I have added a helper script to collect some more information about the 
installed ZFS version(s) and possible panic logs.  Can you download the 
"collect-maczfs-state.sh" script and run it in a terminal?  It will produce a 
file called "collect-mazfs-state-info.txt" in its current directory.  Can you 
attach that file here?  You can check and edit the file before uploading, if 
you want.

Original comment by googlelogin@bjoern-kahl.de on 14 Jan 2014 at 12:34

GoogleCodeExporter commented 8 years ago
Here you are.  Thank you for working on this.

Original comment by j...@hart.fm on 14 Jan 2014 at 1:06

Attachments:

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
FYI - the "collect-maczfs-state.sh" script includes "powerstats" diagnostic 
reports that some Mavericks installs (mine included) generate frequently. (see 
https://discussions.apple.com/message/23816878#23816878).  These reports many 
MB apiece and ought to be skipped when generating the 
collect-mazfs-state-info.txt file.

I don't see where to submit a patch, but the simplest change would be at line 
274:

274c274
< for i in x $(ls -tr "${PANICS}" ) ; do
---
> for i in x $(ls -tr "${PANICS}/*.panic" ) ; do # skip .diag reports

Original comment by j...@hart.fm on 14 Jan 2014 at 4:43

GoogleCodeExporter commented 8 years ago
Thanks for the suggestion.  I have changed the script accordingly and replaced 
the download with the new version. 
Regarding patches, you can either submit to the project mailing list, add here 
in the tracker or clone the source (here or on GitHub) and submit a pull 
request.  See also 
http://code.google.com/p/maczfs/wiki/DevelopmentOverview#Repositories

Original comment by googlelogin@bjoern-kahl.de on 14 Jan 2014 at 8:25

GoogleCodeExporter commented 8 years ago
Fixed in commit 58854f77.

Release target: 74.3.3 (February/March)

This one was difficult, as the crash happens in the VFS layer outside our code. 
 The crash is the result of a slight change in the VFS API, which now requires 
the symlink operation to return a vnode pointer for the link.  In previous 
version this was a "should" instead of "must" and older kernels did an extra 
lookup to get the vnode instead of relying on the symlink FS implementation.

Original comment by googlelogin@bjoern-kahl.de on 9 Feb 2014 at 1:39

GoogleCodeExporter commented 8 years ago
This issue was closed by revision 58854f774922.

Original comment by googlelogin@bjoern-kahl.de on 11 Feb 2014 at 9:32

GoogleCodeExporter commented 8 years ago
Issue 124 has been merged into this issue.

Original comment by googlelogin@bjoern-kahl.de on 15 Feb 2014 at 12:10

GoogleCodeExporter commented 8 years ago
Issue 125 has been merged into this issue.

Original comment by googlelogin@bjoern-kahl.de on 15 Feb 2014 at 12:15

GoogleCodeExporter commented 8 years ago
Issue 129 has been merged into this issue.

Original comment by googlelogin@bjoern-kahl.de on 15 Feb 2014 at 12:56

GoogleCodeExporter commented 8 years ago
Great! thanks for fixing this. I will build and install revision 58854f774922 
until 74.3.3 is released.

Original comment by riteshs...@gmail.com on 15 Feb 2014 at 5:57