cheery / node-wayland

Wayland bindings for node.js
Other
62 stars 2 forks source link

Bus error - when running the examples #1

Closed ghost closed 10 years ago

ghost commented 10 years ago

Hello,

I'm really interested in this project, but I was not able to run the example files. After a bunch of trial and error, I was able to install wayland, and weston on a brand new Ubuntu 14.04 server edition. Then I used node-gyp to compile your lib, and tried to run the simple-shm.coffe file, but all I get is a Bus error message. This happens with almost all the other example files. And I couldn't find any more log messages, or something like that.

Do you happen to know what could be the cause of this?

cheery commented 10 years ago

Hi netlovers.

I remember having that kind of error during development, but I don't remember more right now. It didn't appear during development once I was done, that I am sure of. You said it happens with all the examples you tried?

What is the wayland & weston version you're running? I was running 1.4.0 during the development. If you've got newer than that, it should be okay.

I need to concentrate on different subject today, but I'll look into this problem in few days. I have few ideas where to start looking for. I guess this is going to result in a patch.

ghost commented 10 years ago

Hmm, now that you mention the version, it's 1.3.0. I'm not sure why I was under the impression that it is newer. I'll try to compile a newer one, and get back to you with the results.

Thanks for taking the time to respond this fast.

ghost commented 10 years ago

Okay, I did a bunch of tests. Compiled wayland on a dirty and clean system, too. But nothing changed. The new version is 1.5.90. And the only example that runs is the list_interfaces.coffee.

cheery commented 10 years ago

Thank you for studying it.

The bus errors are said to be rare. Did a quick search and found this: http://lists.freedesktop.org/archives/wayland-devel/2012-December/006686.html

Have you tried your wayland with anything else? Did you ran the demos in the weston compositor?

ghost commented 10 years ago

That solution sort of solved the problem. If I run one of the examples, there is a big chance that it will run the first time, but not the second time. None of the "native" weston demos behave like this. But I noticed that when I try to run a nodejs example and it gives the Bus error message, then I try to run a native weston demo, and go back to the nodejs, it will most likely run again. So basically sometimes it works, other times not.

cheery commented 10 years ago

Didn't mean that as a solution. We're still trying to find out what's wrong in here.

I think the productive approach would be to trace through, see what's it doing. Could you please take simplest example that fails, simplest "native" demo that makes it work, and strace them? The strace happens like this:

strace coffee simple-shm.coffee 2> simple-shm.strace

Then throw the simple-shm.strace as an attachment here.

If it doesn't reveal what's going on, then we could still try these:

ghost commented 10 years ago

So, here it is. It was non-deterministic when it did work, and when it didn't so the native app has quite a lot of trace lines for it. Sorry about that. But I separated the different parts, so I hope at least it will be usable for you. I tried to do diff between the different runs to see if there is anything out of place, but there is very little difference between the second (bus error) run and the fourth (second successful) run.

Actually, I can't upload files here, only images, so here it is instead: https://gist.github.com/netlovers/f52f83fececf587c8028

There are four strace files: 1.strace - the first run which was successful. 2.strace - the second run immediately after the first one, and it gave the bus error. 3.strace - one of the native wayland app's trace. 4.strace - the second successful run, no bus error, whatsoever.

I hope the trace files will be helpful.

cheery commented 10 years ago

Thank you. It seems to SIGBUS on accessing the wayland-shared -file.

I wonder what those demos could be doing or not doing, that is different from node-wayland in meaningful way?

cheery commented 10 years ago

Check the free space available for $XDG_RUNTIME_DIR

ghost commented 10 years ago

I don't think there is a limit on it.

ghost commented 10 years ago

I tried to debug this, and I really don't get something. I made many changes at first, to understand the usefulness of the wayland-shared file, I point it to somewhere else at first, then removed the mmap_fd proxy call (yeah, c++ is not my favorite, and I tend to do lame debugging like this), and then followed wherever the error messages sent me.

And here is the interesting part. There is the client.js in the root directory, and there is this function:

exports.mmap_fd = function(fd, size) {
    data = wl.mmap_fd(fd, size);
    data.free = function() { wl.munmap_fd(this); }
    return data;
};

I put a console.log into that function just to see if the parameters are there. And suddenly there is no Bus error. I can run everything however I want, and no error, everything is perfect, even when I log out just an empty string. I remove the console.log and the error comes back right away. I know this might sound stupid, but this is what I noticed. But I'm sure that this is not a valid fix for the problem...

ghost commented 10 years ago

Here is a better explanation: It seems that the fs.truncate (inside the examples) is not yet finished when the wl.mmap_fd is called. Maybe the correct way would be to put every data manipulation related stuff inside of the truncate's callback, or change it to sync.

cheery commented 10 years ago

In that case this is a bug in the examples. I pushed a patch which changes every fs.truncate to fs.truncateSync.

Let me know if it resolves the problem and if we can close this one.

It seems that I have produced this bug. I was caught by the asyncronity of the node.js filesystem API. It could have been prevented by implementing asynchronous behavior with green threads instead of callbacks. It may have been working on my platform for differences in system, or it may have been broken recently by a behavior change in node.js API (truncate might not have been async 3 months ago). I'll try to be more careful with the async behavior of the node.js from now on.

ghost commented 10 years ago

Yeah, they work fine now. Thank you!