camilleantoun / macfuse

Automatically exported from code.google.com/p/macfuse
Other
0 stars 0 forks source link

SSHFS "disconnects" after editting file with emacs #214

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
  1. Connect to a remote machine via SSHFS; either using the GUI or
     using the following command line
       sshfs user@remote.host:/local/data /tmp/volname -oreconnect,volname=volname
     The remote system is Linux 2.6.9 with OpenSSH 3.9p1

  2. Open a new or existing file on the remote file system in emacs
     and edit it.  A standard text file is fine.

  3. Save the file.

What is the expected output? What do you see instead?

  I expect the file to be saved.  Instead emacs displays the error
  message, "operation not permitted, /tmp/volname/file"

  After that point, the mount point does not work.  Directory listings
  "disappear" from the command line and from the finder.  I.e. if I do
  a listing, there are no files listed.  I am able to unmount the file
  system with no problem (it doesn't hang).

  MacFUSE reports the following messages to syslog.  These repeat
  30-40 times.

    Jun 10 22:18:44 hostname kernel[0]: MacFUSE: starting (version 0.4.0, Jun  5 2007, 20:59:23)
    Jun 10 22:19:52 hostname kernel[0]: MacFUSE: OUCH! daemon did not give fh (type=1, err=-1)
    Jun 10 22:19:52 hostname kernel[0]: MacFUSE: OUCH! daemon did not give fh (type=1, err=-1)
    Jun 10 22:19:52 hostname kernel[0]: MacFUSE: OUCH! daemon did not give fh (type=1, err=-1)

What version of the product are you using? On what operating system?
  MacFUSE 0.4.0
  SSHFS 0.3.0
  Darwin hostname.local 8.9.0 Darwin Kernel Version 8.9.0: Thu Feb 22 20:54:07 PST 2007; root:xnu-792.17.14~1/RELEASE_PPC Power Macintosh powerpc

Please provide any additional information below.

  This error does not occur if I use "vi" or TextEdit as an editor.

  This problem was present in MacFUSE 0.2.0, 0.3.0 and now 0.4.0.

Original issue reported on code.google.com by cbmarkwa...@gmail.com on 11 Jun 2007 at 3:16

GoogleCodeExporter commented 8 years ago
Which emacs are you using--the one that's bundled with Mac OS X 
(/usr/bin/emacs)?

What's the "ls -l" output for the remote file (that is, "ls -l 
/tmp/volname/file")?

What's the "ls -l" output for /local/data on the remote machine?

What's your umask settings on the local (Mac OS X) and the remote (Linux) 
machines?

Original comment by si...@gmail.com on 11 Jun 2007 at 3:35

GoogleCodeExporter commented 8 years ago
Emacs: fink emacs 21.2.1 (X Windows)

umask: 002 (on both sides)

After mounting, file listing running from Mac OS X side:
  > ls -la /tmp/volume/dir1/dir2/dir3/

  total 9568
  drwxrwsr-x   1 user  700     4096 Jun 10 23:21 .
  drwxrwsr-x   1 user  700     4096 Apr 22 02:50 ..
  -rw-rw-r--   1 user  700     7967 Apr 22 05:01 file1
  -rw-rw-r--   1 user  700     2585 Jan 22 14:25 file2
  ... and so on ...

File listing executed on Linux (remote) side:
  > cd /local/data/dir1/dir2/dir3/
  > ls -la
  total 4892
  drwxrwsr-x    14 user group    4096 Jun 10 23:21 .
  drwxrwsr-x    13 user group    4096 Apr 22 02:50 ..
  -rw-rw-r--     1 user group    7967 Apr 22 05:01 file1
  -rw-rw-r--     1 user group    2585 Jan 22 14:25 file2
  ... and so on ...

Original comment by cbmarkwa...@gmail.com on 11 Jun 2007 at 5:07

GoogleCodeExporter commented 8 years ago
I should say that while emacs is editing the file (say it is called "newfile"), 
it
correctly creates the symlink lock-file ".#newfile -> user@hostname.local.4527".

When attempting to save the file, it *does* save the file (or the auto-save file
depending on how long I pause).  Right after that time, the mount becomes 
"disconnected."

The sshfs and sftp processes continue to run despite all this.

Original comment by cbmarkwa...@gmail.com on 11 Jun 2007 at 5:14

GoogleCodeExporter commented 8 years ago
Can you try /usr/bin/emacs and see if that has the same behavior?

Original comment by si...@gmail.com on 11 Jun 2007 at 5:26

GoogleCodeExporter commented 8 years ago
/usr/bin/emacs (21.2.1) functions OK.  

It doesn't have X-windows though, which is why I use the Fink version.

Original comment by cbmarkwa...@gmail.com on 11 Jun 2007 at 5:38

GoogleCodeExporter commented 8 years ago
If I try Fink emacs-X11 with sshfs connected to a Mac OS X remote machine, it 
works fine. If the remote 
system is Linux (2.6.18.5 kernel, OpenSSH_4.5p1, OpenSSL 0.9.8a), then Fink 
emacs-X11 saves the file but 
claims that an error occurred--as you reported. However, in my case, the mount 
doesn't "disconnect" or hang 
otherwise--things on the volume are accessible as before.

Anyway, I will look at this more when I get time--can't make this a priority 
issue right now. Meanwhile, if you 
could narrow this down within emacs (what exactly emacs is doing that causes 
this, and what exactly is the 
error that emacs thinks has occurred), that will be helpful.

It is quite likely that this issue/behavior has the same cause as issue 114.

Original comment by si...@gmail.com on 11 Jun 2007 at 6:09

GoogleCodeExporter commented 8 years ago
I just did a "ktrace" of the offending emacs process (fink emacs-X11).
It looks like emacs sets an "alarm" signal which has a very short duration,
shorter than the time it takes for the remote sshfs can respond.  This
results in an endless cycle of restarted system calls, each one being
interrupted by an alarm.

I can't tell if this is emacs behavior or libc, but I have some evidence that
it is related to how emacs interacts with X-windows. Reference: <a
href="http://osdir.com/ml/emacs.bugs/2002-10/msg00070.html">post</a>.

If it's a network timing difference, that might explain why it works 
differently for
you and me.  

From what I can see, each syscall gets interrupted by an alarm before it can
complete, which causes a set of rapid-fire requests to the remote sshfs process.

Original comment by cbmarkwa...@gmail.com on 11 Jun 2007 at 7:33

GoogleCodeExporter commented 8 years ago
Thanks--that's likely to be useful information.

Original comment by si...@gmail.com on 11 Jun 2007 at 7:58

GoogleCodeExporter commented 8 years ago
Try this:

Take the following code, and compile it as a dynamic shared library.

== cut here ==
// libsetitimer.c

#include <stdio.h>
#include <string.h>
#include <sys/time.h>

typedef struct interpose_s {
    void *new_func;
    void *orig_func;
} interpose_t;

int my_setitimer(int which, const struct itimerval *value,
                 struct itimerval *ovalue);

static const interpose_t interposers[] \
    __attribute__ ((section("__DATA, __interpose"))) = {
        { (void *)my_setitimer,  (void *)setitimer  },
    };

int my_setitimer(int which, const struct itimerval *value,
                 struct itimerval *ovalue)
{
    if (value) {
        struct itimerval new_value;
        memcpy((void *)&new_value, (void *)value, sizeof(struct itimerval));
        new_value.it_value.tv_sec += 1;
        return setitimer(which, &new_value, ovalue);
    }

    return setitimer(which, value, ovalue);
}
== cut here ==

This intercepts and reimplements setitimer(), adding 1 second to the value 
specified in the incoming setitimer
() call from emacs.

To compile, do something like:

$ gcc -Wall -dynamiclib -o /tmp/libsetitimer.dylib libsetitimer.c

To cause it to be used in a precompiled version of emacs, run your emacs 
command something like:

$ DYLD_INSERT_LIBRARIES=/tmp/libsetitimer.dylib /sw/bin/emacs ...

See if it changes the behavior. Experiment with different tweakings of it_value 
if necessary.

Original comment by si...@gmail.com on 11 Jun 2007 at 8:32

GoogleCodeExporter commented 8 years ago
This workaround does indeed work.  Thanks!   I've shimmed it into my emacs for 
now.

Question: sshfs does have an "-o intr" option which allows operations to be
interrupted.  I did not set this option, so why were the emacs open() file 
system
calls being interrupted by a SIGALRM?

Another question:  I see that around Jun-Nov 2006, the original FUSE was 
enhanced to
handle interruptions of the user process more robustly.  Did these changes make 
it
into MacFUSE?  (version 2.6.0; release notes below)
  http://sourceforge.net/project/shownotes.php?release_id=457591&group_id=121684

Thanks for your help!

Original comment by cbmarkwa...@gmail.com on 11 Jun 2007 at 4:20

GoogleCodeExporter commented 8 years ago
> Another question:  I see that around Jun-Nov 2006, the original FUSE was 
enhanced to
> handle interruptions of the user process more robustly.  Did these changes 
make it
> into MacFUSE?  (version 2.6.0; release notes below)

The changes can't just "make into" MacFUSE because MacFUSE (the kernel portion) 
is an
OS X specific implementation that shares nothing with the Linux implementation. 
In
our context here, FUSE is an API--a specification. Linux FUSE is one 
implementation,
MacFUSE is another. MacFUSE will have to have its own implementation of the
FUSE_INTERRUPT message, which it doesn't support yet.

Original comment by si...@gmail.com on 11 Jun 2007 at 7:53

GoogleCodeExporter commented 8 years ago
Thanks, I understand.

So what happens now in the case that the user-space daemon is busy and a signal
arrives to the client program?  If the signal is ignored, then the daemon would 
not
be interrupted, and data should be returned ok.  So that suggests that signals 
*are*
being intercepted before the daemon can return its reply to the client.  Seems 
like
that could disturb the semantics of open(...,O_CREAT) and write(), which is in 
fact
what is happening here. The perversity of the emacs-X11 signal model has only
magnified the issue.

I tried browsing the MacFUSE kernel code, but I couldn't orient myself enough to
figure out what happens.

Thanks again for your timely help.

Original comment by cbmarkwa...@gmail.com on 11 Jun 2007 at 9:38

GoogleCodeExporter commented 8 years ago
In real life, not having FUSE_INTERRUPT shouldn't bite applications much. I do 
intend to implement it--just need 
to find more time.

Original comment by si...@gmail.com on 12 Jun 2007 at 5:32

GoogleCodeExporter commented 8 years ago
Thanks, but my question was, if MacFUSE doesn't support "interrupts", then why 
is the
syscall interrupted at all?

Original comment by cbmarkwa...@gmail.com on 12 Jun 2007 at 9:21

GoogleCodeExporter commented 8 years ago
What I said was that MacFUSE doesn't support the FUSE_INTERRUPT *message* of 
the FUSE API, which means it 
doesn't support passing interruption notification up to the user-space daemon. 
You do want system call 
interruption to be still possible (to avoid nasty hangs) and it works out fine 
in typical cases.

Original comment by si...@gmail.com on 12 Jun 2007 at 10:51

GoogleCodeExporter commented 8 years ago
For reference: another report in issue 236.

Original comment by si...@gmail.com on 8 Jul 2007 at 8:45

GoogleCodeExporter commented 8 years ago
I got this from emacs developer YAMAMOTO Mitsuharu:

"In the Carbon port, the SIGALRM duration is 2
seconds by default and it's not too frequent.  If some file operation
takes much more time than that period, then it is desirable that it
works with signals so users/applications can interrupt the long
operation, IMO.

You can set [the duration] via `polling-period' if you need."

He asked if there was anything specific that emacs could do differently to make 
it
work better with MacFUSE.  If anybody has ideas, post here and I'll route them 
back.

-Bill

Original comment by flowe...@gmail.com on 27 Jul 2007 at 2:33

GoogleCodeExporter commented 8 years ago
I believe that making the duration of the SIGALRM timeout longer than 100 msec 
is
definitely going to help this issue.  While the 100 msec duration is a bit
exorbitant, Emacs is doing the "right thing" by retrying file operations when 
they
are interrupted by a SIGALRM.  MacFUSE appears to be getting confused because it
allows stateful file operations to be interrupted after the state change has 
already
occurred.

Original comment by cbmarkwa...@gmail.com on 28 Jul 2007 at 9:00

GoogleCodeExporter commented 8 years ago
i have pretty much the same problem, so i tried the libsetitimer fix... but i 
got
stuck because it says:

tcsh: DYLD_INSERT_LIBRARIES=/tmp/libsetitimer.dylib: Command not found.

am i missing something? thanks, and sorry i'm so ignorant.

Original comment by michael....@gmail.com on 8 May 2008 at 8:11