benoitc / dnssd_erlang

Erlang interface to Apple's Bonjour DNS Service Discovery implementation
Apache License 2.0
46 stars 16 forks source link

Segmentation fault #5

Closed gleber closed 12 years ago

gleber commented 12 years ago

Hello. I've started an Erlang VM with a process running, does the following:

(<0.2.0>) call dnssd:start()
(<0.92.0>) call dnssd:register("monitor_agent@a","_http._tcp",7779,[{ip,"192.168.0.138"}])
(<0.92.0>) call dnssd:browse("_http._tcp")

After this I start a simple shell script:

while true; do 
  avahi-publish -s monitor_agent@b _http._tcp 7778 ip=192.168.0.138 &
  sleep 3s; 
  killall avahi-publish; 
  sleep 1s; 
  killall avahi-publish;
done

When erlang process gets a {dnssd, _Ref, {browse, Op, {Name, Type, Domain}}} it does the following:

(<0.92.0>) call dnssd:resolve_sync(<<"monitor_agent@b">>,<<"_http._tcp.">>,<<"local.">>)
(<0.92.0>) call dnssd:resolve_sync(<<"monitor_agent@b">>,<<"_http._tcp.">>,<<"local.">>,5000)
(<0.92.0>) call dnssd:resolve(<<"monitor_agent@b">>,<<"_http._tcp.">>,<<"local.">>)
[New Thread 0xb2147b70 (LWP 27558)]
                                   (<0.92.0>) call dnssd:ensure_safe_type(<<"_http._tcp.">>)
(<0.92.0>) call dnssd:parse_type_t(<<"_http._tcp.">>)
(<0.92.0>) call dnssd:parse_type_t(<<>>,<<"http._tcp.">>)
(<0.92.0>) call dnssd:parse_type_t(<<"h">>,<<"ttp._tcp.">>)
(<0.92.0>) call dnssd:parse_type_t(<<"ht">>,<<"tp._tcp.">>)
(<0.92.0>) call dnssd:parse_type_t(<<"htt">>,<<"p._tcp.">>)
(<0.92.0>) call dnssd:parse_type_t(<<"http">>,<<"._tcp.">>)
(<0.92.0>) returned from dnssd:parse_type_t/2 -> {<<"http">>,<<"_tcp.">>}
(<0.92.0>) returned from dnssd:parse_type_t/2 -> {<<"http">>,<<"_tcp.">>}
(<0.92.0>) returned from dnssd:parse_type_t/2 -> {<<"http">>,<<"_tcp.">>}
(<0.92.0>) returned from dnssd:parse_type_t/2 -> {<<"http">>,<<"_tcp.">>}
(<0.92.0>) returned from dnssd:parse_type_t/2 -> {<<"http">>,<<"_tcp.">>}
(<0.92.0>) returned from dnssd:parse_type_t/1 -> {<<"http">>,<<"_tcp.">>}
(<0.92.0>) call dnssd:parse_type_p(<<"_tcp.">>)
(<0.92.0>) returned from dnssd:parse_type_p/1 -> {<<"tcp">>,<<".">>}
(<0.92.0>) call dnssd:valid_subtype(<<".">>)
(<0.92.0>) call dnssd:valid_subtype(<<>>)
(<0.92.0>) returned from dnssd:valid_subtype/1 -> true
(<0.92.0>) returned from dnssd:valid_subtype/1 -> true
(<0.92.0>) returned from dnssd:ensure_safe_type/1 -> <<"_http._tcp.">>
(<0.92.0>) returned from dnssd:resolve/3 -> {ok,#Ref<0.0.0.710>}

Right after this gdb catches segmentation fault in erl:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb46b0b70 (LWP 27508)]
0xb3173c0a in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
(gdb) backtrace
#0  0xb3173c0a in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#1  0xb3173c7a in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#2  0xb3173efb in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#3  0xb316074f in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#4  0xb3160b1a in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#5  0xb3160dc2 in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#6  0xb3160f76 in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#7  0xb316116c in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#8  0xb315d96a in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#9  0xb3166a14 in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#10 0xb316fa86 in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#11 0xb316fbf2 in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#12 0xb3170817 in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#13 0xb3170ff2 in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#14 0xb316f7a9 in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#15 0xb31574fb in ?? () from /lib/i386-linux-gnu/libdbus-1.so.3
#16 0xb3171e39 in dbus_watch_handle () from /lib/i386-linux-gnu/libdbus-1.so.3
#17 0xb31ad337 in ?? () from /usr/lib/i386-linux-gnu/libavahi-client.so.3
#18 0xb319b10c in avahi_simple_poll_dispatch () from /usr/lib/i386-linux-gnu/libavahi-common.so.3
#19 0xb31b6cba in DNSServiceProcessResult () from /usr/lib/i386-linux-gnu/libdns_sd.so.1
#20 0xb31bfbed in ready_io (edd=0x82dba18, ev=0x22) at c_src/dnssd.c:419
#21 0x080e24d1 in erts_port_task_execute (runq=0xb7bc0900, curr_port_pp=0xb7bc3eb8)
    at beam/erl_port_task.c:856
#22 0x080db885 in schedule (p=0xb4c9583c, calls=4) at beam/erl_process.c:5511
#23 0x0815c912 in process_main () at beam/beam_emu.c:1225
#24 0x080d059c in sched_thread_func (vesdp=0xb7bc0e80) at beam/erl_process.c:3789
#25 0x081c38d5 in thr_wrapper (vtwd=0xbfffe700) at pthread/ethread.c:106
#26 0xb7f56d31 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#27 0xb7e9d0ce in clone () from /lib/i386-linux-gnu/libc.so.6

This happens very non-deterministically. My system has the following packages installed:

$ aptitude show libavahi-core6 libavahi-core7 libavahi-compat-libdnssd1 libdbus-1-3 | egrep "(Package|Version)"
Package: libavahi-core6
Version: 0.6.25-1ubuntu6
Package: libavahi-core7
Version: 0.6.30-4ubuntu1
Package: libavahi-compat-libdnssd1
Version: 0.6.30-4ubuntu1
Package: libdbus-1-3
Version: 1.4.14-1ubuntu1

Have anyone seen this problem before?

gleber commented 12 years ago

Just a minute before I got one more segfault: http://pastebin.com/d3PKTjGj

Interesting line there is:

WARNING: Unhandled message:
interface=`Dô·`Dô·hDô·hDô·pDô·pDô·°â°âDô·Dô·Dô·Dô·Dô·Dô·Dô·Dô· Dô· Dô·¨Dô·¨Dô·°Dô·°Dô·¸Dô·¸Dô·ÀDô·ÀDô·ÈDô·ÈDô·ÐDô·ÐDô·ØDô·ØDô·àDô·àDô·èDô·èDô·ðDô·ðDô·øDô·øDô·, path=°~X·««¨á¨áXDô·XDô·`Dô·`Dô·hDô·hDô·pDô·pDô·°â°âDô·Dô·Dô·Dô·Dô·Dô·Dô·Dô· Dô· Dô·¨Dô·¨Dô·°Dô·°Dô·¸Dô·¸Dô·ÀDô·ÀDô·ÈDô·ÈDô·ÐDô·ÐDô·ØDô·ØDô·àDô·àDô·èDô·èDô·ðDô·ðDô·øDô·øDô·, member=Dô·Dô·Dô·Dô·Dô·Dô·Dô·Dô· Dô· Dô·¨Dô·¨Dô·°Dô·°Dô·¸Dô·¸Dô·ÀDô·ÀDô·ÈDô·ÈDô·ÐDô·ÐDô·ØDô·ØDô·àDô·àDô·èDô·èDô·ðDô·ðDô·øDô·øDô·
gleber commented 12 years ago

It seems that actually call to resolve_sync/3 is not needed to produce that segmentation fault. Backtrace from previous comment has been generated with call to resolve_sync/3 commented out. But on the same time it seems that call resolve_sync/3 on each dnssd browse add message increases chance - i.e. segfault happens after much less iterations of test shell script.

gleber commented 12 years ago

Few more crashes. It seems that they are pretty random in terms of place where it happens. My wild guess would be some random memory corruption. http://pastebin.com/P2shBRgA http://pastebin.com/nxVa3Nkp http://pastebin.com/KTkN60PQ

gleber commented 12 years ago

Here are two more crashes:

http://pastebin.com/2pYmy1MH http://pastebin.com/BWsxwr5a

and a minimal test case: https://gist.github.com/1383723

Weird thing about it is that it doesn't crash without tracing enabled. But if I'm running a bigger program it does crash even without tracing, but probability of crash decreases with amount of stuff processes are doing in it - i.e. the more messages and IO interactions processes are doing, the bigger the chance of process crashing. It's as close as I was able to get to it today.

andrewtj commented 12 years ago

Concurrent calls to the Avahi compatibility layer might be the issue. Does changing ERL_DRV_FLAG_USE_PORT_LOCKING to 0 in the driver entry allow your test case to pass?

gleber commented 12 years ago

It seems that it fixes the problem. After running a test for 5 minutes it didn't crashed.

Thanks a lot! It may be worth adding a note to readme about it, so someone don't waste their time looking for this issue, like I did.

andrewtj commented 12 years ago

Good to hear. I'll push a new version with this fix in the next few days.

andrewtj commented 12 years ago

I've just tagged v0.6 which includes a test case for this issue, the driver locking change that fixes it and another fix for the Avahi compatibility layers lack of support for registering services with empty names.

Thanks for reporting the problem and I hope you find the app useful.