erlware / relx

Sane, simple release creation for Erlang
http://erlware.github.io/relx
Apache License 2.0
697 stars 232 forks source link

Modify VM_ARGS passing to escript (fix of issue 648) #649

Closed konyaris closed 5 years ago

juise commented 6 years ago

@konyaris looks like you fix make sense, but I just have checked it with my options and it's didn't work.

The opts: +zdbbl 8192 +W w -connect_all false -hidden -proto_dist inet_tls -ssl_dist_opt server_cacertfile /opt/aaa/spool/secure/cluster/cert.pem -ssl_dist_opt server_certfile /opt/aaa/spool/secure/node/cert.pem -ssl_dist_opt server_keyfile /opt/aaa/spool/secure/node/key.pem -ssl_dist_opt client_cacertfile /opt/aaa/spool/secure/cluster/cert.pem -ssl_dist_opt client_certfile /opt/aaa/spool/secure/node/cert.pem -ssl_dist_opt client_keyfile /opt/aaa/spool/secure/node/key.pem -rsh ssh -env ERL_LIBS /opt/aaa -env ERL_CRASH_DUMP /opt/aaa/erl_crash.dump -env AAA_ROOT /opt/aaa -env AAA_SPOOL /opt/aaa/spool -env AAA_SNAME ghead -env AAA_PARENT_SNAME ghead -env AAA_API_PORT 10001

How I check:

lrascao commented 6 years ago

@juise @konyaris can we add a test for the scenario @juise mentioned?

juise commented 6 years ago

I think @konyaris do his PR as proof of concept, the test will be redundant and pointless...

konyaris commented 6 years ago

@juise Can you try to set ERL_FLAGS environment variable in the shell where you run extended start script from? It is the best if you remove ERL_FLAGS line from extended start script before trying.

tsloughter commented 6 years ago

I'm kind of lost whats going on with the issues linked to and this PR :), but would like to get it in soon if it is needed. We'll be cutting a new rebar3 soon for OTP21 changes found in rc2 and would like to get a new relx in with it.

juise commented 6 years ago

Oh, sorry, I'm forgot about this problem, I'll try to fix it in a couple of days

tsloughter commented 6 years ago

@juise but I don't see where it is set to the name of the release.

tsloughter commented 6 years ago

Please rebase this.

tsloughter commented 6 years ago

@juise so does this work for you? I'd like to get the fix in for a new release that we want to do soon.

And can someone please explain what is going on :) @konyaris @juise @lrascao ?

This completely removes a function, escript_emulator_args, is it really not needed by anyone for functionality?

tsloughter commented 6 years ago

Actually, makes sense now.

But does it need to be export ERL_FLAGS? Is that necessary for it to be picked up by the nodetool run?

tsloughter commented 6 years ago

@juise I'd really like to get rid of that escript_emulator_args function, which this PR does, but don't want to merge this if it is going to break your use case. Can you confirm it is broken for you if we were to merge this?

Maybe it needs to be ERL_AFLAGS?

juise commented 6 years ago

@tsloughter I'm really so sorry, but I'm still didn't check this fix, in this weekend I'll do this

juise commented 6 years ago

@tsloughter @konyaris I have checked this changes and looks like ERL_FLAGS works: ping, status, start/stop commands work perfectly. But!!! remote_console doesn't work, it's looks like:

root@container:/opt/swm/0.0.8/bin# ./swm-0.0.8+build.86.refaaa404c remote_console
<--debug echo $ERL_FLAGS-->
-proto_dist inet_tls -ssl_dist_opt server_cacertfile /opt/swm/spool/secure/cluster/cert.pem -ssl_dist_opt server_certfile /opt/swm/spool/secure/node/cert.pem -ssl_dist_opt server_keyfile /opt/swm/spool/secure/node/key.pem -ssl_dist_opt client_cacertfile /opt/swm/spool/secure/cluster/cert.pem -ssl_dist_opt client_certfile /opt/swm/spool/secure/node/cert.pem -ssl_dist_opt client_keyfile /opt/swm/spool/secure/node/key.pem -rsh ssh -env ERL_LIBS /opt/swm -env ERL_CRASH_DUMP /opt/swm/erl_crash.dump -env SWM_ROOT /opt/swm -env SWM_SPOOL /opt/swm/spool -env SWM_SNAME chead -env SWM_PARENT_SNAME ghead -env SWM_API_PORT 10001
<--debug echo $ERL_FLAGS-->
Erlang/OTP 20 [erts-9.1] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:10] [kernel-poll:false]

*** ERROR: Shell process terminated! (^G to start new job) ***
tsloughter commented 6 years ago

@juise damn, so close. can you tell whats wrong with the args it is using?

tsloughter commented 6 years ago

@juise I've got to cut a release today to get ahead of the release of OTP-21, but happy to cut another one as soon as this is figured out.

juise commented 6 years ago

damn, so close. can you tell whats wrong with the args it is using?

@tsloughter, to be honestly, I don't know whats wrong with args, I've checked args from bug and then put this args into header of nodetool, and everything works fine, so maybe I have mistake in some place, I will try to find out where is problem in.

I've got to cut a release today to get ahead of the release of OTP-21, but happy to cut another one as soon as this is figured out.

ok, got it.

tsloughter commented 6 years ago

Hm? The remote console doesn't run through nodetool.

djnym commented 5 years ago

I just ran into another problem with escript_emulator_args today when I upgraded rebar3 to 3.6.2 and pulled in a newer relx, suddenly I started getting errors like

sed: couldn't open temporary file /usr/lib64/freqserver/bin/sed3f2THU: Permission denied
sed: couldn't open temporary file /usr/lib64/freqserver/bin/sedRzsK7X: Permission denied
rm: cannot remove `/usr/lib64/freqserver/bin/nodetool.prev': No such file or directory

which I did not see before. What seemed to be happening was that it's attempting to do these substitutions to files in place, and doesn't have permission. This was happening whenever I tried to invoke any nodetool commands via my extended_bin script. I see a couple issues with the approach

  1. it assumes the user invoking the commands is the same as the user which owns the directories (which it often is not if you package your releases or install them on as a different user than you run as).
  2. if you have packaged up your release as an RPM like I have this taints the RPM as it now contains scripts with different md5sums than what was installed by the RPM I hand applied this patch to my extended_bin and it seemed to fix the issues I was having. I'm able to run commands without the above errors, am able to connect a remote console, and all the other operations I expected to be able to do.

@tsloughter I think this should be merged.

djnym commented 5 years ago

Okay, so I did run into the remote_console issue and had to remove the passing of VM_ARGS for the call to relx_rm_sh in order to get it to work. I don't see the point of passing VM_ARGS to the remote console call as the remote console often does need those args anyway. In this case the vm.args contained information about binding schedulers, and all sort of stuff which was unnecessary. So I think it would be safe to remove as well.

aboroska commented 5 years ago

Note that not only the cookie but also kernel net_ticktime should be the same on the nodes

djnym commented 5 years ago

@aboroska I think those are already covered. I think the issue was that $VM_ARGS is being added to ERL_FLAGS, but was not removed from the remsh command. @juise can you try

@@ -168,7 +168,7 @@ relx_rem_sh() {
     # Setup remote shell command to control node
     exec "$BINDIR/erl" "$NAME_TYPE" "$id" -remsh "$NAME" -boot start_clean \
          -boot_var ERTS_LIB_DIR "$ERTS_LIB_DIR" \
-         -setcookie "$COOKIE" -hidden -kernel net_ticktime $TICKTIME $VM_ARGS
+         -setcookie "$COOKIE" -hidden -kernel net_ticktime $TICKTIME
 }

to see if that fixes your remote_console issue? I'm using the patch in this PR along with the above change and am able to successfully get remote shells (as well as all the other standard commands).

djnym commented 5 years ago

@tsloughter @juise now that the newest rebar3 is out, would it be possible to look into this now? I can continue to patch this in our local build's but would prefer if it worked correctly out of the box.

tsloughter commented 5 years ago

@djnym hey, yes. I sort of want to just offer an optional "remote console vm.args" setting. Then if the default way it rips out certain pieces doesn't work for someone they can create a custom file.

djnym commented 5 years ago

@tsloughter what would that look like, maybe just a different environment variable which is added. So in the patch I sent above to this PR instead of removing ${VM_ARGS}, maybe rename to ${REMOTE_CONSOLE_VM_ARGS}? Or were you thinking something more involved?

tsloughter commented 5 years ago

@djnym yea, I was thinking something in the relx config:

{remote_console_vm_args, "config/rc_vm.args"}

then the start script is modified to use that and if it isn't set by the user it can be a copy of vm.args minus the stuff that we subtract.

nalundgaard commented 5 years ago

@tsloughter if remote_console_vm_args were implemented as described, would that also resolve #656?

tsloughter commented 5 years ago

@nalundgaard yea, would mean you can control exactly which args are used.

djnym commented 5 years ago

So I was trying to upgrade past rebar3 3.6.2, and this issue bit be again because my patches failed to patch cleanly. I see another patch was accepted which requires that you have a writable tmp directory under $ROOTDIR which seems annoying, as I tend to install things in /usr/lib64 and expect them to be not writable by just anyone. Is there anyway we can get this bumped or re-evaluated in some way? @tsloughter @lrascao @tolbrino

tsloughter commented 5 years ago

@djnym yea, I do no like the tmp direcotry.

what do you think of my suggestion of supporting a separate vm args file in the config?

djnym commented 5 years ago

@tsloughter I think that would probably be fine. I'm trying to go through the trail of tickets to see why this is even necessary in the first place. The tmp directory thing also seems like a security hole. It essentially rewrites the nodetool escript and runs it. It seems quite possible you could do something nefarious with that. Do you why this stuff is even necessary? It seems to only wrap vm args for nodetool at this point and I'm not even sure why?

tsloughter commented 5 years ago

@djnym in general why vm.args has to be modified?

Or this nodetool stuff? I really have no idea regarding what this patch is removing, and would love to have it gone because of that. I didn't merge yet mostly because of that lack of understanding and not wanting to break anything.

@lrascao ?

djnym commented 5 years ago

I don't know, I think it has to do with running upgrade escripts of some sort. Here's the original change which added stuff https://github.com/erlware/relx/commit/310b7047a2dba4d4ed6ef8ef52a96456073e0c16 That appears to just look for %%!.* and replace it with the contents of VM_ARGS. That really should never have been merged, as it modifies file created during release so screws any sort of downstream packaging of these files. Then there are various changes trying to work around issues with this approach. In general I don't see why it's necessary. I believe things were working fine without these things.

tsloughter commented 5 years ago

Ok. And looks like this was working for @juise except for remote_console.

djnym commented 5 years ago

@tsloughter Yeah but then this whole tmp file thing was added recently which I think is a mistake as it rewrites nodetool and then executes it. That seems wrong. Rewriting the vm.args and sys.config was one thing but rewriting nodetool and and the upgrade escripts seems bad. And they rewrite in place or require a non-settable /tmp directory. Also, I fixed the remote console part if you look back through the comments here. I'm going to try to get a clean patch which just reverts most of these changes to use locally so I can update rebar3, I'll submit that as another PR instead of this one. Maybe we can move the discussion there as this one is long. Maybe @konyaris @juise can comment on that one?

tsloughter commented 5 years ago

@djnym oh great. Yea if you submit a PR I'll merge it and we can get new releases out soon.

djnym commented 5 years ago

@tsloughter Okay, I created https://github.com/erlware/relx/pull/688 I wasn't able to run the test locally for some reason, so we'll see if it passes travis, but essentially, it's this patch cleaned up to work with master. It removes all the file rewriting and relies on the ERL_FLAGS. The remsh issue that @juise had should be fixed by the change on line 173.

djnym commented 5 years ago

Since #688 was merged this can probably be closed. Also @tsloughter any idea on when you plan to tag and update rebar3?