Closed Kuroneer closed 4 years ago
As an example of different commands (erl vs rebar3 shell):
$ HOME=/ erl -sname test@localhost -setcookie cookie
Erlang/OTP 22 [erts-10.5] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
Eshell V10.5 (abort with ^G)
(test@localhost)1>
User switch command
--> q
$ HOME=/ erl -sname test@localhost
2019-11-29 02:06:21.155797
args: []
format: "Failed to create cookie file '/.erlang.cookie': eacces"
<omitted>
Crash dump is being written to: erl_crash.dump...done
cd erlang-project
$ HOME=/ rebar3 shell --sname test@localhost --setcookie cookie
===> Verifying dependencies...
===> Compiling erlang-project
=ERROR REPORT==== 29-Nov-2019::02:06:33.480090 ===
Failed to create cookie file '/.erlang.cookie': eacces
<omitted>
Erlang/OTP 22 [erts-10.5] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
Eshell V10.5 (abort with ^G)
1>
User switch command
--> q
cd erlang-project
$ HOME=/ ERL_FLAGS=' -nocookie' rebar3 shell --sname test@localhost --setcookie cookie
===> Verifying dependencies...
===> Compiling erlang-project
Erlang/OTP 22 [erts-10.5] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
Eshell V10.5 (abort with ^G)
(test@localhost)1>
User switch command
--> q
I guess this is tangentially related to https://bugs.erlang.org/browse/ERL-476?jql=text%20~%20%22HOME%22 for the artificial / path being set, but this isn't the main issue here. I would probably point out to people reading the issue that this is likely only a problem for test nodes running distributed, as prod nodes would generally use the vm.args
file to set a node name and a cookie explicitly (particularly if they use the templates) and so that would all be avoided.
Setting no cookie by default would probably end up being a problem for all the folks who currently use dist mode without also specifying a cookie and instead relying on the global one, and now their setup would break by returning nocookie
when trying to establish a connection, so I don't think defaulting to that is a safe option for now. It's probably going to break more builds than it is going to fix. Unless we had a way to modify start arguments after they have been started, we can't do this conditionally either.
So your workaround is probably one of the few acceptable ways to go about this.
An alternative is possibly to have the cookie written to /tmp, which would require no special permissions; anyone in a container should be able to write there and by doing so you'd skip the eaccess
error. It requires setting HOME to point there, but would bypass most of the problem directly as far as I can tell. Another option would be to just pre-generate a cookie file in the container image. Cookies are not security anyway -- they're more to prevent accidentally connecting to a node, and most production builds I run just hardcode the name of the release as a cookie.
I'm a bit annoyed that there's no way to call set_cookie/2
on a node before the network is up. Since the call is based on init args (rather than some intermediary state) and at boot by the auth server, we can't easily work around it (https://github.com/erlang/otp/blob/d6285b0a347b9489ce939511ee9a979acd868f71/lib/kernel/src/auth.erl#L271-L274) otherwise without patching OTP.
Actually right before posting my comment, I realized that we could maybe do something a bit devious here, but I think it would be relatively safe:
-nocookie
as an argumentThis would, for all intents and purpose, preserve the existing semantics with the distinction that we do not hard-crash if $HOME is not writeable. It risks causing confusion in the case where someone is on a system where $HOME is read-only, the whole thing still boots, but all the individual nodes can't talk because they don't have matching cookies whereas if the filesystem was writeable, they would. There are no errors to explain why that is.
I'm not sure if it's that good of a failure mode since it's so subtle!
I'd like to point that rebar3 does not stop running the tests, just fails to set up the distributed mode.
I think that your steps are correct, but I'd remove 2., as if the cookie is provided, no file is read/written by erl (so the cookie would be expected to be set for the slave nodes manually).
If $HOME is no writable and no cookie is provided, a warning could be issued with something like "Could not write auto-generated cookie, either use '-nocookie' flag or explicitly set the cookie in the slave nodes"
What I'm wondering is if it's worth maintaining or it's better just acknowledging the issue and providing workarounds.
Yeah I'm wondering the same thing. I think it would make sense to keep the existing failure mode if only because it's a better representation of what might happen in prod (if you don't provide CLI args, it will die for this reason when Rebar3 is out of the picture), but it makes for a more confusing experience for our tool itself.
Ok, I'll close this issue then. Thank you!
Rationale
Using docker (with an uncommon setup) to test
dist_node
SUITEs, the following error is generated:The cause of this error is that the user running the tests inside docker "does not exist", so its HOME has been artificially set to
/
Environment
Unfortunately, I cannot provide the actual code I'm testing, but if requested, I'm sure I'll be able to create a project where this is seen.
Current behaviour
As explained in the first section, the issue is when creating the erlang cookie:
(Relevant lines)
Expected behaviour
My expectation would be that, having included
there would be no need of writing the cookie file and it wouldn't fail to start in distributed mode.
Triage
When
erl
starts the distributed mode, if no cookie configuration is passed as CLI arg,erl
creates the cookie file.rebar3 executing without these CLI options but with
dist_node
configuration starts the distributed mode (creating a random cookie file) and then sets the cookie specified in the config. https://github.com/erlang/rebar3/blob/b8f8f3e5d6047feb86d755cadfdbb03c6e0512a0/src/rebar_dist_utils.erl#L53-L61When
rebar3 ct
is run with a HOME without W permissions, first it tries to create the cookie file (and fails), regardless of thedist_node
config.I can easily workaround this for my flow by creating a fake HOME or allowing anyone to write anything inside the docker, but I was wondering if it would be worth the trouble creating a PR to fix this issue in rebar3 itself:
One solution would be to make rebar3 escript run with the
-nocookie
flag, this would avoid creating any cookie file onnet_kernel:start
, allowing the cookie to be set later on. If I read the code correctly, this solution would require allowing custom emulator args inrebar_prv_escriptize
, and adding the flag when escriptizing/running rebar3. This solution would also mean that no cookie file would be created by rebar3, with all the implications this has.I'm not sure that this is big enough an issue, what do you think?