erlang / otp

Erlang/OTP
http://erlang.org
Apache License 2.0
11.35k stars 2.95k forks source link

`peer:start_link/1` times out when the given node already exists #8930

Open xxdavid opened 6 days ago

xxdavid commented 6 days ago

Describe the bug

When I try to start a peer node which already exists, the call times out, instead of immediately returning a clear error.

To Reproduce

$ erl -sname foo
Erlang/OTP 27 [erts-15.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1]

Eshell V15.1 (press Ctrl+G to abort, type help(). for help)
(foo@ROFL)1> peer:start_link(#{name => bar}).
{ok,<0.96.0>,bar@ROFL}
(foo@ROFL)2> peer:start_link(#{name => bar}).
** exception exit: timeout
     in function  peer:start_it/2 (peer.erl, line 929)
=ERROR REPORT==== 11-Oct-2024::14:02:56.698739 ===
** Generic server <0.96.0> terminating
** Last message in was {'EXIT',<0.94.0>,
                           {timeout,
                               [{peer,start_it,2,
                                    [{file,"peer.erl"},{line,929}]},
                                {erl_eval,do_apply,7,
                                    [{file,"erl_eval.erl"},{line,904}]},
                                {shell,exprs,7,
                                    [{file,"shell.erl"},{line,893}]},
                                {shell,eval_exprs,7,
                                    [{file,"shell.erl"},{line,849}]},
                                {shell,eval_loop,4,
                                    [{file,"shell.erl"},{line,834}]}]}}
** When Server state == {peer_state,#{name => bar},
                                    bar@ROFL,
                                    "/Users/David/.asdf/installs/erlang/27.1/erts-15.1/bin/erl",
                                    ["-sname","bar","-detached",
                                     "-peer_detached","-user","peer",
                                     "-origin",
                                     "g1h3CGZvb0BST0ZMAAAAYAAAAABnCRJw"],
                                    undefined,undefined,<<>>,running,
                                    {<0.94.0>,
                                     #Ref<0.2057912559.480509953.125513>},
                                    0,#{}}
** Reason for termination ==
** {timeout,[{peer,start_it,2,[{file,"peer.erl"},{line,929}]},
             {erl_eval,do_apply,7,[{file,"erl_eval.erl"},{line,904}]},
             {shell,exprs,7,[{file,"shell.erl"},{line,893}]},
             {shell,eval_exprs,7,[{file,"shell.erl"},{line,849}]},
             {shell,eval_loop,4,[{file,"shell.erl"},{line,834}]}]}
=CRASH REPORT==== 11-Oct-2024::14:02:56.699768 ===
  crasher:
    initial call: peer:init/1
    pid: <0.96.0>
    registered_name: []
    exception exit: {timeout,
                        [{peer,start_it,2,[{file,"peer.erl"},{line,929}]},
                         {erl_eval,do_apply,7,
                             [{file,"erl_eval.erl"},{line,904}]},
                         {shell,exprs,7,[{file,"shell.erl"},{line,893}]},
                         {shell,eval_exprs,7,[{file,"shell.erl"},{line,849}]},
                         {shell,eval_loop,4,[{file,"shell.erl"},{line,834}]}]}
      in function  gen_server:decode_msg/9 (gen_server.erl, line 2299)
    ancestors: [<0.94.0>,<0.93.0>,<0.76.0>,<0.71.0>,<0.75.0>,<0.70.0>,
                  kernel_sup,<0.47.0>]
    message_queue_len: 2
    messages: [{'EXIT',<0.94.0>,normal},{nodedown,bar@ROFL}]
    links: []
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 10958
    stack_size: 29
    reductions: 14648
  neighbours:

Expected behavior

I would expect a clear error, similar to what slave:start_link/2 returns.

$ erl -sname foo
Erlang/OTP 27 [erts-15.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1]

Eshell V15.1 (press Ctrl+G to abort, type help(). for help)
(foo@ROFL)1> {ok, Host} = inet:gethostname().
{ok,"ROFL"}
(foo@ROFL)2> slave:start_link(Host, bar).
{ok,bar@ROFL}
(foo@ROFL)3> slave:start_link(Host, bar).
{error,{already_running,bar@ROFL}}

Affected versions

I guess all OTP versions since OTP 25. I tested it on OTP 27.1 and OTP 26.2.1.

max-au commented 4 days ago

This is, indeed, a bug - it should immediately crash with this Reason (as tested [url=https://github.com/erlang/otp/blob/master/lib/stdlib/test/peer_SUITE.erl#L503]here[/url]):

** Reason for termination ==
** {{boot_failed,{exit_status,1}},
    [{peer,start_it,2,[{file,"peer.erl"},{line,922}]},
     {erl_eval,do_apply,7,[{file,"erl_eval.erl"},{line,904}]},

Somehow it works for standard_io but not normal distribution.

I'll take a look at it.