fivdi / onoff

GPIO access and interrupt detection with Node.js
MIT License
1.24k stars 124 forks source link

Timing issue when setting direction #20

Closed pascal-fb-martin closed 9 years ago

pascal-fb-martin commented 10 years ago

The README.md file talks at long about gpio export requiring root access.

On my 2014-01-07-wheezy-raspbian image (Rapberry Pi B), /sys/class/gpio/export is indeed owned by root, but gives access to group gpio as well (and user pi is member of the gpio group).

When I start my node.js application as user pi, the onoff export works, but setting the direction fails with EACCES. When I look at /sys/class/gpio, I see the gpio4 directory matching my pin 4. I can set the direction from the shell, and then set the output. Never been root.

I experimented with a script, and I found that I have to sleep for at least 0.2 second before I can access the direction file. I tried to wait until the direction file was created, with no success: the script still needs to wait after that.

Then I tried parallelism: export all 8 pins, wait and then set all 8 directions. It worked if it waits at least 0.4 second.

Do you have ever noticed such a peculiar thing?

fivdi commented 10 years ago

No, I haven't see this happen. Does it also happen if sudo is used to run the node.js application as root? Are you using the quick2wire gpio-admin utility?

I'll try to reproduce the issue with the 2014-01-07-wheezy-raspbian image.

I suppose the readme does talk a bit too much about requiring root access :)

pascal-fb-martin commented 10 years ago

I did not use any of the utilities listed. I looked at the permissions, tried by hand (no sudo), saw that it worked and then launched the app. I started trying to understand what was the permission problem: since I was tired of using the shell history, I wrote a script.. and then it failed there too.

I modified my script to use no delay at all: it worked as root, it failed as user pi. Very repeatable. (My original script, with the delay, still works as user pi.)

My goal is to start my application at boot time. I guess that an easy workaround is to enable all 8 GPIO in the init.d script and then launch the application.. Not sure why I would need these utilities, btw.

fivdi commented 10 years ago

So, I installed 2014-01-07-wheezy-raspbian today and it comes with GPIO related changes when compared to the older version of Raspbian that I was using up till now. I don't know when things changed, but they have.

Here is the output from a few command on the old version of Raspbian when logged on as user pi:

pi@raspberrypi /sys/class/gpio $ uname -a
Linux raspberrypi 3.6.11+ #538 PREEMPT Fri Aug 30 20:42:08 BST 2013 armv6l GNU/Linux
pi@raspberrypi /sys/class/gpio $ ls -la
total 0
drwxr-xr-x  2 root root    0 Mar 26 21:32 .
drwxr-xr-x 37 root root    0 Nov 23 10:59 ..
--w-------  1 root root 4096 Mar 26 21:32 export
lrwxrwxrwx  1 root root    0 Mar 26 21:32 gpiochip0 -> ../../devices/virtual/gpio/gpiochip0
--w-------  1 root root 4096 Mar 26 21:32 unexport
pi@raspberrypi /sys/class/gpio $ echo 23 > export 
-bash: export: Permission denied
pi@raspberrypi /sys/class/gpio $ 

Note that the owner and group for all files is root. In addition, user pi can't successfully export GPIO #23. This is why the readme talks at long about gpio export requiring root access :)

Here is the output from the same commands on 2014-01-07-wheezy-raspbian:

pi@raspberrypi /sys/class/gpio $ uname -a
Linux raspberrypi 3.10.25+ #622 PREEMPT Fri Jan 3 18:41:00 GMT 2014 armv6l GNU/Linux
pi@raspberrypi /sys/class/gpio $ ls -la
total 0
drwxrwx---  2 root gpio    0 Jan  1  1970 .
drwxr-xr-x 39 root root    0 Mar 26 21:30 ..
-rwxrwx---  1 root gpio 4096 Jan  1  1970 export
lrwxrwxrwx  1 root gpio    0 Jan  1  1970 gpiochip0 -> ../../devices/virtual/gpio/gpiochip0
-rwxrwx---  1 root gpio 4096 Jan  1  1970 unexport
pi@raspberrypi /sys/class/gpio $ echo 23 > export 
pi@raspberrypi /sys/class/gpio $ 

The owner of all files is still root, but the group has changed to gpio for /sys/class/gpio and its content. Also, the user pi can now successfully export GPIO 23.

Below is another interesting sequence of commands and the corresponding output. The commands unexport GPIO 23, export GPIO 23, and list the contents of /sys/class/gpio/gpio23 twice. The output of the two ls commands are quite different. The first time ls is executed, the group for all files in /sys/class/gpio/gpio23 is root, the second time it's executed, the group has changed to gpio. The permissions for the files have also been modified. This explains why you're getting the EACCES errors. It's taking quite a while for the the GPIO export to complete it's work in the background.

pi@raspberrypi /sys/class/gpio $ echo 23 > unexport; echo 23 > export; ls -la ./gpio23/; ls -la ./gpio23/
total 0
drwxr-xr-x 3 root root    0 Mar 26 21:27 .
drwxrwx--- 4 root gpio    0 Mar 26 21:27 ..
-rw-r--r-- 1 root root 4096 Mar 26 21:27 active_low
-rw-r--r-- 1 root root 4096 Mar 26 21:27 direction
-rw-r--r-- 1 root root 4096 Mar 26 21:27 edge
drwxr-xr-x 2 root root    0 Mar 26 21:27 power
lrwxrwxrwx 1 root root    0 Mar 26 21:27 subsystem -> ../../../../class/gpio
-rw-r--r-- 1 root root 4096 Mar 26 21:27 uevent
-rw-r--r-- 1 root root 4096 Mar 26 21:27 value
total 0
drwxrwx--- 3 root gpio    0 Mar 26 21:27 .
drwxrwx--- 4 root gpio    0 Mar 26 21:27 ..
-rwxrwx--- 1 root gpio 4096 Mar 26 21:27 active_low
-rwxrwx--- 1 root gpio 4096 Mar 26 21:27 direction
-rwxrwx--- 1 root gpio 4096 Mar 26 21:27 edge
drwxrwx--- 2 root gpio    0 Mar 26 21:27 power
lrwxrwxrwx 1 root gpio    0 Mar 26 21:27 subsystem -> ../../../../class/gpio
-rwxrwx--- 1 root gpio 4096 Mar 26 21:27 uevent
-rwxrwx--- 1 root gpio 4096 Mar 26 21:27 value

If the onoff Gpio constructor detects that the GPIO value file already exists, it assumes that the corresponding GPIO has already been exported any will not attempt to export it again. It will still attempt to set the direction and edge. See here and here. This means that in your scenario it should be possible to export the pins using, for example, a bash script. Then, after a short delay, onoff could be used for everything else.

I'm not sure if onoff needs to be modified to handle this issue. That's something I still have to figure out.

pascal-fb-martin commented 10 years ago

The repeated ls is a really good idea! Here is my guess: there is a background process that changes the access when it detects that the kernel created the files. That would explain the timing.

It seems that raspbian tried to resolve the annoying root issue, but they might have tested their fix manually and not thought through the timing issue.

If that is true, that would be a raspbian bug. I am not sure if onoff can do something about it, indeed. Trying to workaround bugs may backfire in other circumstances.

Regarding the possible workaround: yes, I already have the script written. Another possibility is to call the onoff Gpio twice if the first fails. I am trying a somewhat complicated scheme if it fails: setup an interval that retry the setup and write the value to whatever the application tried to write during the wait.

Because I know what my application is doing, I should be fine with the inherent limitations of that workaround. The benefit is: no script to remember running, no side effect when raspbian gets fixed.

pascal-fb-martin commented 10 years ago

Yep, the trick worked for me: need to remember the last value the program tried to set and add a 'ready' flag set to true only after a successful "new Gpio()", this for each pin.

Works if pins are not exported, not triggered when pins were already exported.

I will search around to see if there is already a report to raspbian. If not, I may generate one (or will you?).

Thank you for your help.

pascal-fb-martin commented 10 years ago

I saw your submission #553 to Raspberrypi/linux. Thanks.

fivdi commented 10 years ago

No problem :)

fivdi commented 10 years ago

The replies to the raspberypi/linux issue are here. onoff will be modified to handle this issue in the not to distant future. It very likely to be braking change. Currently an onoff Gpio object can be used as soon as it's constructed. After the modification it will be necessary to wait until the Gpio is ready for usage.

fivdi commented 10 years ago

Initial exploration indicates that the amount of time required for the permissions and group of the GPIO files to change to the correct values is highly dependent on the current system load. If the system load is low, the values will change within 0.4s, most of the time, but not all the time. If the current system load is high, 10s may not be enough!

The permissions also appear to flip back and forward from incorrect to correct and then back to incorrect before finally settling down to the correct values.

If GPIOs are constantly exported and unexported (using a test program) the memory footprint of one of the udevd processes increases a fair bit and the rate at which pins can be exported and imported decreases. Perhaps it's necessary to wait for unexport operations to complete also.

pascal-fb-martin commented 10 years ago

Reverting the permissions seems like a Raspbian bug to me.

This being said, my own technique should still work: the code retries until onoff has been successful. Once the file is open, the permission changes should not matter.

I don't care why it fails, I just retry anyway. You could do better and filter the relevant error codes. On Apr 7, 2014 12:52 PM, "Brian Cooke" notifications@github.com wrote:

Initial exploration indicates that the amount of time required for the permissions and group of the GPIO files to change to the correct values is highly dependent on the current system load. If the system load is low, the values will change within 0.4s, most of the time, but not all the time. If the current system load is high, 10s may not be enough!

The permissions also appear to flip back and forward from incorrect to correct and then back to incorrect before finally settling down to the correct values.

If GPIOs are constantly exported and unexported (using a test program) the memory footprint of one of the udevd processes increases a fair bit and the rate at which pins can be exported and imported decreases. Perhaps it's necessary to wait for unexport operations to complete also.

Reply to this email directly or view it on GitHubhttps://github.com/fivdi/onoff/issues/20#issuecomment-39775342 .

psi-4ward commented 9 years ago

nothing new here?

For me i created an async Constructor function:

Gpio.constructAsync = function() {
  // find the callback
  var cb = arguments[arguments.length-1];
  if(typeof cb != 'function') throw new Error('You need to provide a callback as last argument');

  // get the arguments
  var args = [].slice.call(arguments, 0, arguments.length-1);
  var cnt = 0;

  // new dynamic constructor
  function F(args) {
    return Gpio.apply(this, args);
  }
  F.prototype = Gpio.prototype;

  (function construct() {
    try {
      cb(null, new F(args));
    } catch (e) {
      cnt++;
      if(e.code !== 'EACCES' || cnt > 10) throw e;
      setTimeout(construct, 50);
    }
  })();
}
fivdi commented 9 years ago

Yes, something like that would resolve the issue.

jwatt commented 9 years ago

Brian, any chance you could work around the Raspbian brokenness in onoff for now so that all onoff consumers don't have to? Something along the lines of the following would probably be sufficient:

function callSyncUntilNoExceptionWithTimeout(func, timeout) {
  var start = Date.now();
  for (;;) {
    try {
      return func();
    } catch (e) {
      if (Date.now() - start >= timeout) {
        throw e;
      }
    }
  }
}

// Use callSyncUntilNoExceptionWithTimeout to work around embarrassing Raspbian race condition
// https://github.com/raspberrypi/linux/issues/553
// in order to write to the 'direction' file.
var self = this;
callSyncUntilNoExceptionWithTimeout(function() {
  return fs.writeFileSync(self.gpioPath + 'direction', direction);
}, 2000);
psi-4ward commented 9 years ago

You block my whole App dude

jwatt commented 9 years ago

onoff's Gpio constructor is already blocking (implemented using lots of fs.*Sync() calls), so if you don't want your process to be blocked on file system responsiveness you shouldn't be using it. So long as the constructor continues to do blocking initialization I think it should be consistent and block on the chgrp too rather than throw for that weird race edge case as it currently does. Right now my concern is to see a minimal, non-breaking fix as soon as possible to avoid more people wasting their time tripping over this.

Don't get me wrong though. I'd rather be able to avoid any blocking functions. IMO continuing to have the constructor be blocking and adding an constructAsync function would be poor API though, since I think calling out blocking functions by having "Sync" in their name is good practice. I certainly don't think that any shipping constructAsync function should be implemented in terms of the blocking constructor as in your workaround code. Ideally I'd prefer the constructor be replaced with an async function with a callback, or have the initialization step broken out from the constructor into init() and initSync() methods. All these changes are breaking change that I imagine Brian is less keen to make though and, even if he makes them, will take longer than a change that maintains the current semantics and gets things working for people again.

fivdi commented 9 years ago

I'll take a look at fixing this issue in January when I get back from vacation. Another possibility would be to have two constructor functions, an asynchronous open (like the suggestion from @psi-4ward above) and a synchronous openSync.

What I don't want to do is avoid blocking functions completely as the blocking writeSync method is ten times faster than the non-blocking write method (see benchmarks).

pascal-fb-martin commented 9 years ago

I did something similar, but less generic since this is only an initialization issue and the constructor is well known. It has worked flawlessly for months. On Dec 21, 2014 9:55 AM, "Christoph Wiechert" notifications@github.com wrote:

nothing new here?

For me i created an async Constructor function:

Gpio.constructAsync = function() { // find the callback var cb = arguments[arguments.length-1]; if(typeof cb != 'function') throw new Error('You need to provide a callback as last argument');

// get the arguments var args = [].slice.call(arguments, 0, arguments.length-1); var cnt = 0;

// new dynamic constructor function F(args) { return Gpio.apply(this, args); } F.prototype = Gpio.prototype;

(function construct() { try { cb(null, new F(args)); } catch (e) { cnt++; if(e.code !== 'EACCES' || cnt > 10) throw e; setTimeout(construct, 50); } })(); }

— Reply to this email directly or view it on GitHub https://github.com/fivdi/onoff/issues/20#issuecomment-67782252.

pascal-fb-martin commented 9 years ago

Since this is only a gpio pin initialization problem, I am not sure that performance is always an issue. My app (a sprinkler controller) opens about 8 pins and then keep them open for quite a long time (until I stop it, which seldom happens nowadays). On Dec 28, 2014 1:02 AM, "Brian Cooke" notifications@github.com wrote:

I'll take a look at fixing this issue in January when I get back from vacation. Another possibility would be to have two constructor functions, an asynchronous open (like the suggestion from @psi-4ward https://github.com/psi-4ward above) and a synchronous openSync.

What I don't want to do is avoid blocking functions completely as the blocking writeSync method is ten times faster than the non-blocking write method (see benchmarks https://github.com/fivdi/onoff#benchmarks).

— Reply to this email directly or view it on GitHub https://github.com/fivdi/onoff/issues/20#issuecomment-68204137.

fivdi commented 9 years ago

This issue should be fixed with onoff v1.0.0. Some may not like the chosen solution. It's a bit of a hack.