Closed briansorahan closed 9 years ago
It would be very helpful if you could narrow it down a bit more. Is it the node version, the libzmq version, the OS?
FWIW, here is a backtrace from lldb (crash.js is the program in OP's gist):
Brians-MacBook-Air:trystero-zeromq brian$ lldb node crash.js
Current executable set to 'node' (x86_64).
(lldb) b V8::Dispose
Breakpoint 1: where = node`v8::V8::Dispose(), address = 0x00000001001300d0
(lldb) r
Process 72362 launched: '/Users/brian/.nvm/v0.10.33/bin/node' (x86_64)
MAX_SOCKETS=1023
creating sockets
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,libc++abi.dylib: terminating with uncaught exception of type std::runtime_error
Process 72362 stopped
* thread #1: tid = 0x3f38c1, 0x00007fff8ac4b866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff8ac4b866 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill + 10:
-> 0x7fff8ac4b866: jae 0x7fff8ac4b870 ; __pthread_kill + 20
0x7fff8ac4b868: movq %rax, %rdi
0x7fff8ac4b86b: jmpq 0x7fff8ac48175 ; cerror_nocancel
0x7fff8ac4b870: ret
(lldb) bt
* thread #1: tid = 0x3f38c1, 0x00007fff8ac4b866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff8ac4b866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff92e9f35c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x00007fff97de3b1a libsystem_c.dylib`abort + 125
frame #3: 0x00007fff92a66f31 libc++abi.dylib`abort_message + 257
frame #4: 0x00007fff92a8c952 libc++abi.dylib`default_terminate_handler() + 264
frame #5: 0x00007fff923ff322 libobjc.A.dylib`_objc_terminate() + 124
frame #6: 0x00007fff92a8a1d1 libc++abi.dylib`std::__terminate(void (*)()) + 8
frame #7: 0x00007fff92a89c5b libc++abi.dylib`__cxa_throw + 124
frame #8: 0x0000000103149d52 zmq.node`Socket(this=0x0000000100b28c40, context=<unavailable>, type=<unavailable>) + 364 at binding.cc:487
frame #9: 0x00000001031467e1 zmq.node`zmq::Socket::New(args=0x00007fff5fbfdc28) + 391 at binding.cc:352
frame #10: 0x00000001001567fc node`v8::internal::Builtin_HandleApiCallConstruct(v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>, v8::internal::Isolate*) + 588
Thanks, that is useful. Seems it is crashing here
due to throwing an exception. This is because zmq_getsockopt
returns -1 for some reason http://api.zeromq.org/master:zmq-getsockopt , but I don't see which of EINVAL
, ETERM
, EFAULT
, EINTR
was the cause, although that should be part of the exception message. Don't know why this happens either.
My initial guess is that os x sucks because you are hitting the maximum open number of file handles problem. Seems it defaults to 256, which is very low. Try increasing it somehow and see if that helps.
It does seem to be an OS X problem, but
Brians-MacBook-Air:zeromq.node brian$ ulimit -n
256
And this program can get the file descriptor of 122 zmq sockets, but for every socket after that reports Socket operation on non-socket
, despite the fact that ZMQ_MAX_SOCKETS reports 1024.
After doing ulimit -n 1024
the above program starts misbehaving at the 506th call to zmq_getsockopt.
This seems to just be an annoying issue with Mac, and I just found in the zmq tuning guide that they recommend doing ulimit -n 1200
, however
Brians-MacBook-Air:zeromq.node brian$ sudo ulimit -n 1200
Password:
Brians-MacBook-Air:zeromq.node brian$ ulimit -n
1024
I'll close, and possibly discuss upstream.
Those two links I posted claim to show the necessary steps. Just doing ulimit -n
won't cut it.
On OS X, the open file limits are governed by launchd and sysctl values.
launchd: Processes are started by launchd, which imposes resource constraints on any process it > launches. These limits can be retrieved and set using the launchctl command (the default soft and hard values are 256 and unlimited, respectively). For OS X 10.7 and later, even though the default hard limit is "unlimited", you can't set the hard or soft limit to "unlimited" yourself.
sysctl: Operating system open files limits are set with sysctl. These limits can also impact running processes, so the launchd and sysctl open file limits should be set to the same values.
+1 that check sounds very reasonable
wouldn't it be a negative integer returned or would it be non-NULL?
zmq 4.x says
The zmq_socket() function shall return an opaque handle to the newly created socket if successful. Otherwise, it shall return NULL and set errno to one of the values defined below.
ya that sounds correct, since the socket is a void star you can do that
I'm curious why throw std::runtime_error(ErrorMessage())
was not showing Socket operation on non-socket
in my terminal after the zmq_getsockopt(...ZMQ_FD) call using a NULL socket.
I would guess zmq_errno() is not returning the expected value?
zmq_errno returns 24, which is what it returns in my gist as well. Maybe the fact that I didn't get the strerror in my terminal is a Mac issue as well, since
#include <stdexcept>
int main() {
throw new std::runtime_error("foo");
}
outputs
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error*
Abort trap: 6
I'll send a PR here very soon, but do you care if I just fprint and exit instead of throwing when zmq_socket returns NULL? I just ran the simple C++ program above on an Ubuntu 14.04 VM with libstdc++-4.8 installed through apt and it still doesn't put the string I pass to std::runtime_error in my terminal.
I'd like to see this run across the test suite, so I just sent a PR to fix that.. let's wait and see what the others think
oops i should have added my comment on your PR, here lets reference it. https://github.com/JustinTulloss/zeromq.node/pull/377
I'm using zmq 2.8.0 installed with npm and seeing an uninformative C++ crash.
I've created a gist with example code and posted the output in a comment.
I am unable to reproduce with