erlware / relx

Sane, simple release creation for Erlang
http://erlware.github.io/relx
Apache License 2.0
696 stars 233 forks source link

Use `erlexec` directly in relx helper functions #902

Closed keynslug closed 2 years ago

keynslug commented 2 years ago

...Instead of $BINDIR/erl.

Latter might resolve to the system-wide erl script, notably when a release has been made with {include_erts, false} and {system_libs, true} (which is usual).

In such a case even the smallest discrepancy in versions between an Erlang/OTP used to assemble a release (e.g. 24.1.7) and an Erlang/OTP running it later (e.g. 24.2.0) will draw such release impossible to start. This happens because relx helper functions employ start_clean bootfile from a release, and this bootfile attempts to start core application (kernel / stdlib) versions as seen at the build time, which will not be present system-wide at the run time.

tsloughter commented 2 years ago

Thanks. To make sure I understand why this is needed: The issue is that system wide erl is a shell script that will be different env variables to point to where Erlang is which leads to it using the global system libs instead of the system libs as part of the release?

And this works fine even if {system_libs, false} is used?

keynslug commented 2 years ago

Basically, yes. The issue I stumbled upon initially was that I couldn't start a release built with {system_libs, true} under 24.1 on a system with 24.2.

And this works fine even if {system_libs, false} is used?

Can't remember if I verified that. I'll do that next week and report back.

tsloughter commented 2 years ago

hm but should it be exec erlexec? I'd say for sure yes since erl the script does that but we use dyn_erl (copied to erl, which is C code) when include_erts is true, so I need to see what it does in its code.

tsloughter commented 2 years ago

Yea, it uses execvp:

 execvp(erlexec, argv);

So my thinking is it should be exec $BINDIR/erlexec

keynslug commented 2 years ago

And this works fine even if {system_libs, false} is used?

Nope. For the sake of completeness here are the results of running bin/relname ping under Erlang/OTP 24.3.4 in a release assembled with Erlang/OTP 24.2.0 without and with this patch under different relx settings (in rebar.config):

master / patched {mode, prod} {mode, minimal}
{system_libs, true} ✅ / ✅ ❌ / ✅
{system_libs, false} ❌ / ❌ ❌ / ❌

Turns out this patch makes situation better only under {mode, minimal} and {system_libs, true}.

I believe releases without system libraries don't work in this scenario (different build-time / run-time OTP versions) because bin/relname still refer to $REL_DIR/start_clean which pins system libraries' versions at build-time. Though I don't know if this is even feasible or desirable to make releases runnable with such combination of settings in this scenario.

So my thinking is it should be exec $BINDIR/erlexec

This patch is concerned only with relx_get_nodename helper function behavior which is being called during release startup (in particular), I think there's no point execing into it?

tsloughter commented 2 years ago

Right, it should not be runnable with different system libs, the user has to have the right ones installed if they want to run a release that doesn't include system libs.

And you are right about exec, I was thinking this was running erl to start the release.

tsloughter commented 2 years ago

Ok, great. So it improves scenarios and doesn't break any existing working scenarios? I'll go ahead and merge this if I read this right.

keynslug commented 2 years ago

So it improves scenarios and doesn't break any existing working scenarios?

Yep, seems so. Feeling a little paranoid I just verified that the patch doesn't break anything under the most common scenario where build-time and run-time Erlang/OTP versions are the same. Good news it doesn't. Bad news I just found out that combination of {mode, prod} + {system_libs, false} isn't working even on main.

I suppose this is because the script currently assumes that system libraries are located in the same place as the runtime system, and under {mode, prod} this evaluates to the release's libdir where system libraries are obviously missing given {system_libs, false}.

tsloughter commented 2 years ago

So unrelated to this? Can you open an issue and I'll merge this and cut a release.