Open GoogleCodeExporter opened 9 years ago
Original comment by alex.ble...@gmail.com
on 24 Oct 2009 at 9:24
Original comment by alex.ble...@gmail.com
on 24 Oct 2009 at 9:28
To clarify, even if you unmount the drive via Finder first and then remove the
physical drive the system still
kernel panics? So it's not failure to un-mounting first that is a problem,
rather that the pool is still imported
until exported via a 'zpool export' command. Semantics, but just making sure we
are all thinking of the same
issue.
What do we want to define as "expected behavior"?
The promise of ZFS is that in theory on disk is always consistent, but a kernel
panic is really bad in all but the
most extreme cases. Users really don't like kernel panics when they are just
trying to get something done and
doing the same thing that has always worked, like just ejecting a disk, causes
a kernel panic. And although I
understand the intent of the panic is to not cause further harm to the
filesystem, I think we can find a way to
determine if the USB drive was truly removed (via an IOKit perhaps?) or if
there actually is a hardware failure and
then decide which response is appropriate.
This does appear to be something that we should be able to tackle before/while
moving to new ZFS code as this
mostly appears to be a Mac OS X interaction issue. My thinking is that an eject
command *should* be able to
trigger a zpool export. At a minimum I'm thinking we should try and honor the
failmode property on a zpool set
by the user.
Note, all usage of the word "we" above does mean I'm also looking at ways to
fix the issue and write the needed
code, just looking for feedback and/or other ideas. ;)
Original comment by jason.richard.mcneil
on 29 Oct 2009 at 5:16
There's really a few related things. Firstly, the r72 didn't have a failmpde
for the pool - I think it got added
in later. Prior to that, the effect of a pool failure was to do a halt. This
might make sense in an
environment where there is only ZFS pools, but in a mixed-mode system (or one
with network storage)
it's conceivable that a network share could be used to save dirty editors etc.
The other problem is the automounter in OSX. When you plug in a USB drive, it
gets automounted. If it's
HFS (or ext etc) and then yank it, you get a message saying you're a naughty
person, don't do it again. If
it's a ZFS drive it mounts, but when you yank it it kills the system.
Rolling forward to a newer version of zfs wold give us the failmode but we may
be able to grep for halt()
in the meantime.
Original comment by alex.ble...@gmail.com
on 29 Oct 2009 at 10:07
Is there a reason to wait for the failmode property in 77? It seems as though
this is major issue that will scare people away from ZFS along the lines of
Trash support. Would it be possible to just block that IO thread until the pool
reappears?
Does 74 have the same behavior if an internal SATA cable fails or comes loose?
In cases of a single or multiple drive/pool failure when the system drive is
still good, I'd think that the most commonly desired behavior would be to allow
the application / user to recover and save any work in progress to an alternate
location.
I don't really understand why a disappearing pool is any different from a power
outage as far as that specific pool is concerned. Why not just return errors
for the IO operations and then recover (possibly via a forced zpool import)
when the pool is available again(w/ caveats of fsync ordering problems on USB
drives) the same way one would do so after a kernel panic or a power outage?
Why have the kernel panic?
After putting my questions in writing, I think Alex answered them in Comment 4
already. This panic is found in MacZFS/usr/src/uts/common/fs/zfs/zio.c near
line 918. From AlBlue's repository with the Trash fix I found the following.
How could we exit here without panicking? Is there a good way to set the pool
status as "crashed" here and just return with an error?
The strange thing about this code is that the CANFAIL flag is passed and
checked, but the comments indicate that this means we cannot fail? This isn't a
"simple" bug with a fix we might be able to backport easily is it?
/*
* For I/O requests that cannot fail, panic appropriately.
*/
if (!(zio->io_flags & ZIO_FLAG_CANFAIL)) {
char *blkbuf;
blkbuf = kmem_alloc(BP_SPRINTF_LEN, KM_NOSLEEP);
if (blkbuf) {
sprintf_blkptr(blkbuf, BP_SPRINTF_LEN,
bp ? bp : &zio->io_bp_copy);
}
panic("ZFS: %s (%s on %s off %llx: zio %p %s): error "
"%d", zio->io_error == ECKSUM ?
"bad checksum" : "I/O failure",
zio_type_name[zio->io_type],
vdev_description(vd),
(u_longlong_t)zio->io_offset,
zio, blkbuf ? blkbuf : "", zio->io_error);
}
Original comment by dayenter...@gmail.com
on 16 Feb 2011 at 3:21
"Is there a reason to wait for the failmode property in 77?"
Well, yes, frankly. Firstly, because it's non-trivial to significantly change
the underlying codebase (without causing problems for merging later on), and
secondly, because MacZFS_77 is relatively speaking, just round the corner.
There are some problems which need to be ironed out but that represents a less
significant change than the kind of thing you are thinking of.
MacZFS has always had this restriction, and it hasn't caused the fear and scare
that you quote so far.
Original comment by alex.ble...@gmail.com
on 16 Feb 2011 at 9:58
"MacZFS has always had this restriction, and it hasn't caused the fear and
scare that you quote so far."
Please try to see this from the point of the user. "Fear" and "scare" are all
relative. When someone tries MacZFS, all it takes is a single kernel panic on
an operation as common as an external drive unexpectedly disconnecting before
they start uninstalling. ZFS is supposed to bring enhanced reliability to
storage. "Reliability" and "kernel panics" don't belong in the same sentence,
regardless of the circumstances or judgements being made to force the kernel
panic.
With the exception of the Mac Pro, all current and recent generation
Macintoshes are "integrated systems", meaning that the average user can't
realistically add internal storage. The only way we have to add large amounts
of storage (for which people would look to ZFS to prevent bit rot and do
RAID-Z, etc.) is to connect that storage externally via USB, Firewire or
Thunderbolt. Either that or move to SAN, which for many is prohibitively
expensive.
From that, one can draw the conclusion that the most common way for Mac users
to add large amounts of storage to their system is via externally connected
storage. Because none of the usable external storage connection mechanisms is
based on the concept of physically secured connectors, drives becoming
accidentally disconnected is an *far too common* phenomenon for Mac users who
would like to use ZFS. However, kernel panics are completely unacceptable. In
the end, its the filesystem authors making a value judgement of "what's more
acceptable - filesystem problems or lost work in running apps". I don't believe
that's a decision that the filesystem should be making without any alternative
for the user. Kernels shouldn't panic, plain and simple.
Original comment by timhenr...@gmail.com
on 11 Oct 2011 at 2:43
tim, you are perfectly right. for mac community this is requirement, not
something nice to have. if ZFS is supposed to work with non-removable drives
only, then any initiative to port the project to mac has no point.
Original comment by sbernat...@gmail.com
on 20 Dec 2011 at 8:53
Tim and sbern
I completely agree with the both of you, I downloaded MacZFS, installed it and
used it for a while.
I wanted to Eject the disk but the Finder wouldn't allow Me because the disk
was in Use, WHAT?
In use while nothing was written to neither was there any app running from it.
So, I decided to just yank the cord, I almost never have to do this but I just
wanted to without a restart.
To My surprise the whole system crashed.
No one will use MacZFS if this is not solved, heck I am even believing that
part of the reason for Apple to drop ZFS was for this very same reason.
Solve the problem and I will be back.
Original comment by febbyman...@gmail.com
on 28 Jan 2012 at 8:26
Issue 102 has been merged into this issue.
Original comment by alex.ble...@gmail.com
on 7 Mar 2012 at 11:28
[deleted comment]
Having similar issues here. Created a ZFS volume on two (JBOD) drives connected
by USB2. Volumes work. Problem: system crashes when ejecting (sometimes).
Another strange behaviour: I eject the volume, pull the USB connector,
everything works. When I reconnect the USB connector, the kernel panics.
Original comment by lars.ebe...@gmail.com
on 23 Sep 2012 at 10:16
This is a major issue.
As was outlined above 95% of macs around are integrated systems. And we do have
to disconnect these zfs drives anyway.
I get kernel panik if:
1) eject the disk using finder
2) disconnect the cord
3) Insert usb cord -> PANIC
I can avoid panic if I:
1) eject the disk using finder
2) export via terminal
2) disconnect the cord
3) Insert usb cord
It took me quite a few kernel panics to lear that and still I suffer from that
as my wife does not know a lot about terminal.
Original comment by y...@yan.my
on 1 Mar 2013 at 12:07
Original issue reported on code.google.com by
alex.ble...@gmail.com
on 24 Oct 2009 at 9:14