Closed konyaris closed 5 years ago
@juise @konyaris can we add a test for the scenario @juise mentioned?
I think @konyaris do his PR as proof of concept, the test will be redundant and pointless...
@juise Can you try to set ERL_FLAGS environment variable in the shell where you run extended start script from? It is the best if you remove ERL_FLAGS line from extended start script before trying.
I'm kind of lost whats going on with the issues linked to and this PR :), but would like to get it in soon if it is needed. We'll be cutting a new rebar3 soon for OTP21 changes found in rc2 and would like to get a new relx in with it.
Oh, sorry, I'm forgot about this problem, I'll try to fix it in a couple of days
@juise but I don't see where it is set to the name of the release.
Please rebase this.
@juise so does this work for you? I'd like to get the fix in for a new release that we want to do soon.
And can someone please explain what is going on :) @konyaris @juise @lrascao ?
This completely removes a function, escript_emulator_args
, is it really not needed by anyone for functionality?
Actually, makes sense now.
But does it need to be export ERL_FLAGS
? Is that necessary for it to be picked up by the nodetool run?
@juise I'd really like to get rid of that escript_emulator_args
function, which this PR does, but don't want to merge this if it is going to break your use case. Can you confirm it is broken for you if we were to merge this?
Maybe it needs to be ERL_AFLAGS
?
@tsloughter I'm really so sorry, but I'm still didn't check this fix, in this weekend I'll do this
@tsloughter @konyaris I have checked this changes and looks like ERL_FLAGS works: ping, status, start/stop commands work perfectly. But!!! remote_console doesn't work, it's looks like:
root@container:/opt/swm/0.0.8/bin# ./swm-0.0.8+build.86.refaaa404c remote_console
<--debug echo $ERL_FLAGS-->
-proto_dist inet_tls -ssl_dist_opt server_cacertfile /opt/swm/spool/secure/cluster/cert.pem -ssl_dist_opt server_certfile /opt/swm/spool/secure/node/cert.pem -ssl_dist_opt server_keyfile /opt/swm/spool/secure/node/key.pem -ssl_dist_opt client_cacertfile /opt/swm/spool/secure/cluster/cert.pem -ssl_dist_opt client_certfile /opt/swm/spool/secure/node/cert.pem -ssl_dist_opt client_keyfile /opt/swm/spool/secure/node/key.pem -rsh ssh -env ERL_LIBS /opt/swm -env ERL_CRASH_DUMP /opt/swm/erl_crash.dump -env SWM_ROOT /opt/swm -env SWM_SPOOL /opt/swm/spool -env SWM_SNAME chead -env SWM_PARENT_SNAME ghead -env SWM_API_PORT 10001
<--debug echo $ERL_FLAGS-->
Erlang/OTP 20 [erts-9.1] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:10] [kernel-poll:false]
*** ERROR: Shell process terminated! (^G to start new job) ***
@juise damn, so close. can you tell whats wrong with the args it is using?
@juise I've got to cut a release today to get ahead of the release of OTP-21, but happy to cut another one as soon as this is figured out.
damn, so close. can you tell whats wrong with the args it is using?
@tsloughter, to be honestly, I don't know whats wrong with args, I've checked args from bug and then put this args into header of nodetool, and everything works fine, so maybe I have mistake in some place, I will try to find out where is problem in.
I've got to cut a release today to get ahead of the release of OTP-21, but happy to cut another one as soon as this is figured out.
ok, got it.
Hm? The remote console doesn't run through nodetool.
I just ran into another problem with escript_emulator_args today when I upgraded rebar3 to 3.6.2 and pulled in a newer relx, suddenly I started getting errors like
sed: couldn't open temporary file /usr/lib64/freqserver/bin/sed3f2THU: Permission denied
sed: couldn't open temporary file /usr/lib64/freqserver/bin/sedRzsK7X: Permission denied
rm: cannot remove `/usr/lib64/freqserver/bin/nodetool.prev': No such file or directory
which I did not see before. What seemed to be happening was that it's attempting to do these substitutions to files in place, and doesn't have permission. This was happening whenever I tried to invoke any nodetool commands via my extended_bin script. I see a couple issues with the approach
@tsloughter I think this should be merged.
Okay, so I did run into the remote_console issue and had to remove the passing of VM_ARGS for the call to relx_rm_sh in order to get it to work. I don't see the point of passing VM_ARGS to the remote console call as the remote console often does need those args anyway. In this case the vm.args contained information about binding schedulers, and all sort of stuff which was unnecessary. So I think it would be safe to remove as well.
Note that not only the cookie but also kernel
net_ticktime
should be the same on the nodes
@aboroska I think those are already covered. I think the issue was that $VM_ARGS is being added to ERL_FLAGS, but was not removed from the remsh command. @juise can you try
@@ -168,7 +168,7 @@ relx_rem_sh() {
# Setup remote shell command to control node
exec "$BINDIR/erl" "$NAME_TYPE" "$id" -remsh "$NAME" -boot start_clean \
-boot_var ERTS_LIB_DIR "$ERTS_LIB_DIR" \
- -setcookie "$COOKIE" -hidden -kernel net_ticktime $TICKTIME $VM_ARGS
+ -setcookie "$COOKIE" -hidden -kernel net_ticktime $TICKTIME
}
to see if that fixes your remote_console issue? I'm using the patch in this PR along with the above change and am able to successfully get remote shells (as well as all the other standard commands).
@tsloughter @juise now that the newest rebar3 is out, would it be possible to look into this now? I can continue to patch this in our local build's but would prefer if it worked correctly out of the box.
@djnym hey, yes. I sort of want to just offer an optional "remote console vm.args" setting. Then if the default way it rips out certain pieces doesn't work for someone they can create a custom file.
@tsloughter what would that look like, maybe just a different environment variable which is added. So in the patch I sent above to this PR instead of removing ${VM_ARGS}, maybe rename to ${REMOTE_CONSOLE_VM_ARGS}? Or were you thinking something more involved?
@djnym yea, I was thinking something in the relx config:
{remote_console_vm_args, "config/rc_vm.args"}
then the start script is modified to use that and if it isn't set by the user it can be a copy of vm.args
minus the stuff that we subtract.
@tsloughter if remote_console_vm_args
were implemented as described, would that also resolve #656?
@nalundgaard yea, would mean you can control exactly which args are used.
So I was trying to upgrade past rebar3 3.6.2, and this issue bit be again because my patches failed to patch cleanly. I see another patch was accepted which requires that you have a writable tmp directory under $ROOTDIR which seems annoying, as I tend to install things in /usr/lib64 and expect them to be not writable by just anyone. Is there anyway we can get this bumped or re-evaluated in some way? @tsloughter @lrascao @tolbrino
@djnym yea, I do no like the tmp direcotry.
what do you think of my suggestion of supporting a separate vm args file in the config?
@tsloughter I think that would probably be fine. I'm trying to go through the trail of tickets to see why this is even necessary in the first place. The tmp directory thing also seems like a security hole. It essentially rewrites the nodetool escript and runs it. It seems quite possible you could do something nefarious with that. Do you why this stuff is even necessary? It seems to only wrap vm args for nodetool at this point and I'm not even sure why?
@djnym in general why vm.args has to be modified?
Or this nodetool stuff? I really have no idea regarding what this patch is removing, and would love to have it gone because of that. I didn't merge yet mostly because of that lack of understanding and not wanting to break anything.
@lrascao ?
I don't know, I think it has to do with running upgrade escripts of some sort. Here's the original change which added stuff https://github.com/erlware/relx/commit/310b7047a2dba4d4ed6ef8ef52a96456073e0c16 That appears to just look for %%!.* and replace it with the contents of VM_ARGS. That really should never have been merged, as it modifies file created during release so screws any sort of downstream packaging of these files. Then there are various changes trying to work around issues with this approach. In general I don't see why it's necessary. I believe things were working fine without these things.
Ok. And looks like this was working for @juise except for remote_console
.
@tsloughter Yeah but then this whole tmp file thing was added recently which I think is a mistake as it rewrites nodetool and then executes it. That seems wrong. Rewriting the vm.args and sys.config was one thing but rewriting nodetool and and the upgrade escripts seems bad. And they rewrite in place or require a non-settable /tmp directory. Also, I fixed the remote console part if you look back through the comments here. I'm going to try to get a clean patch which just reverts most of these changes to use locally so I can update rebar3, I'll submit that as another PR instead of this one. Maybe we can move the discussion there as this one is long. Maybe @konyaris @juise can comment on that one?
@djnym oh great. Yea if you submit a PR I'll merge it and we can get new releases out soon.
@tsloughter Okay, I created https://github.com/erlware/relx/pull/688 I wasn't able to run the test locally for some reason, so we'll see if it passes travis, but essentially, it's this patch cleaned up to work with master. It removes all the file rewriting and relies on the ERL_FLAGS. The remsh issue that @juise had should be fixed by the change on line 173.
Since #688 was merged this can probably be closed. Also @tsloughter any idea on when you plan to tag and update rebar3?
@konyaris looks like you fix make sense, but I just have checked it with my options and it's didn't work.
The opts:
+zdbbl 8192 +W w -connect_all false -hidden -proto_dist inet_tls -ssl_dist_opt server_cacertfile /opt/aaa/spool/secure/cluster/cert.pem -ssl_dist_opt server_certfile /opt/aaa/spool/secure/node/cert.pem -ssl_dist_opt server_keyfile /opt/aaa/spool/secure/node/key.pem -ssl_dist_opt client_cacertfile /opt/aaa/spool/secure/cluster/cert.pem -ssl_dist_opt client_certfile /opt/aaa/spool/secure/node/cert.pem -ssl_dist_opt client_keyfile /opt/aaa/spool/secure/node/key.pem -rsh ssh -env ERL_LIBS /opt/aaa -env ERL_CRASH_DUMP /opt/aaa/erl_crash.dump -env AAA_ROOT /opt/aaa -env AAA_SPOOL /opt/aaa/spool -env AAA_SNAME ghead -env AAA_PARENT_SNAME ghead -env AAA_API_PORT 10001
How I check: