andreasscherbaum / ansible-lxc-ssh

Ansible connection plugin using ssh + lxc-attach
45 stars 22 forks source link

patch suggested by @AnvithLobo #38

Closed stefangweichinger closed 3 years ago

stefangweichinger commented 3 years ago

suggested by @AnvithLobo in https://github.com/andreasscherbaum/ansible-lxc-ssh/issues/37

boucman commented 3 years ago

don't you already use _play_context elsewhere to get the remote user ?

stefangweichinger commented 3 years ago

Looks good to me. Just beware that self._play_context may have it's own issues we are not aware of since Ansible recommends using self.get_option().

I only wrote the PR to make the patch merge-able. You decide which function/method to use here.

andreasscherbaum commented 3 years ago

@dmp1ce @stefangweichinger @boucman I'm sorting through all the new PRs right now, and this one is the first one. Does this solve your original problem with the variables?

dmp1ce commented 3 years ago

It probably does solve @stefangweichinger issue but it might have unintended consequences because get_option is recommended over _play_context.

stefangweichinger commented 3 years ago

I can't tell right now, the original bug in debops is https://github.com/debops/debops/issues/1731 and @boucman reported it. Could you test, @boucman ?

dmp1ce commented 3 years ago

The tests included in PR #40 ought to verify if this fix is working or not.

AnvithLobo commented 3 years ago

both the fixes work but like @dmp1ce said it might have unintended consequences.

boucman commented 3 years ago

Ok, I need to be honest with myself: I will not find the time to test anything in the near futur.

I am reasonably confident this will fix my bug since that is almost how I did my workaround. Don't wait for me at this point, or we will block on this forever :(

stefangweichinger commented 3 years ago

thanks @boucman

stefangweichinger commented 3 years ago

Could/should we ask someone from the ansible-project to have a look and decide if get_option or _play_context ?

andreasscherbaum commented 3 years ago

If @AnvithLobo says that both ways works, and no one is sure which one is the correct way to solve this, plus there might be side effects, I'm very much in favor of the suggestion from @stefangweichinger to include external experience.

I'm going to merge whatever bugfix solves this problem, if someone can confirm that it's solved the "proper" way.

dmp1ce commented 3 years ago

My vote is for #40 #41 and #42 . #42 is built on #41 and #40. I only broke them up into 3 PRs for clarity.

AnvithLobo commented 3 years ago

Yeah i would vote for @dmp1ce's PRs as we don't modify any other code how we are getting args other than just removing LXC version checking. Since anything above ubuntu 16.04 will have lxc version 2 or above installed by default i dont think this check is necessary any longer. But we do need LXD version checking in Tests which we can improve on once these PRs are merged.

andreasscherbaum commented 3 years ago

I still don't get why you say that both ways work, only one is potentially correct, but yet this should be merged as it is without any confirmation.

For the version checks: I agree that the code for v1 should be removed, it's no longer supported. I tend to disagree to remove all the checks, just to potentially add this later. But I don't have strong feelings about this. However the tests should have a version check, and this can be added in the PR before merging it. This does not need another PR just to fix the PR which is missing that.

stefangweichinger commented 3 years ago

@dmp1ce could you point us to some docs or so pointing out why _play_context is problematic? I am trying to ask someone on IRC #ansible, but as mentioned, I am no competent programmer, so my understanding is limited.

I also agree on adding/fixing the tests within this PR, as @andreasscherbaum said.

stefangweichinger commented 3 years ago

I had a conversation in the #ansible room on IRC.

@tadeboro told me a few things, I quote some lines here (and he told me to point out that he is "NOT part of the core Ansible team, I am just a developer who wrote too much ansible-related code ;)" ):

tadeboro: sgw1: get_option is what you want here.
(11:40:31) sgw1: could you point me at the "why" ? I have to bring arguments there ;-) thanks
(11:41:42) tadeboro: sgw1: You can start with https://github.com/ansible/ansible/blob/2cbfd1e350cbe1ca195d33306b5a9628667ddda8/lib/ansible/plugins/connection/__init__.py#L66-L68 ;)
(11:42:31) sgw1: I will try to, thanks. We should get on with that PR and I thought I ask "upstream" here
(11:43:35) tadeboro: Play context is not something new code should use and should move to get_option/set_options functions.
(11:44:06) sgw1: I just have to find out why the "get_option" way didn't work in the first place, the original issue was https://github.com/andreasscherbaum/ansible-lxc-ssh/issues/37
(11:44:58) sgw1: ssh.py was used as kind of a template for lxc_ssh.py ...  / https://github.com/andreasscherbaum/ansible-lxc-ssh/issues/37#issuecomment-859130298
(11:45:56) sgw1: we face some problems with the user context, that's why someone came up with using play context.
(11:46:13) tadeboro: sgw1: Ugh, the code for that connection plugin is really old and does not call set_options() anywhere.
(11:46:35) sgw1: tadeboro: you mean the code of ssh.py is really old?
(11:48:05) tadeboro: Actually, both ;) The ssh connection plugin is really old and the lxc-ssh probably copied over the old stuff ;)
(11:48:47) sgw1: Isn't ssh.py the default way of connecting and should therefore be up to date and modern?
(11:49:32) tadeboro: It is a default way and it works. But it has also been there almost from the start of the ansible, so it is definitelly not the most modern piece of code there is.
(11:49:56) tadeboro: A bit of "do not fix things that are not broken" at work here ;)
(11:51:53) sgw1: tadeboro: I see and understand. But the "copied code" fails for the lxc_ssh.py plugin, so we have to find a way somehow.
(11:53:15) tadeboro: If you do not intend to modernize the codebase, then using the _play_content is way to go. This does mean that you are adding some technical debt, but if migrating things is not an options, it will have to do.

As a pointer to some modern way he mentioned:

https://steampunk.si/blog/let-us-give-ansible-a-rest/

Maybe https://steampunk.si/blog/let-us-give-ansible-a-rest/ cames in handy? The connection plugin example there (https://github.com/xlab-steampunk/ansible-pms-plugins/blob/master/connection/connection_plugins/pms.py) only uses modern constructs and you will see that there is no play_context being used anywhere.

I will have a look at that in the afternoon.

I hope this helps to decide ....

thanks @tadeboro

EDIT: sorry for the cutNpaste, not all URLs clickable ... but the tl;dr is clear, I assume

stefangweichinger commented 3 years ago

I found https://github.com/ansible/ansible/pull/70438 upstream and in

https://github.com/ansible/ansible/pull/70443/files#diff-38cec806ea1a1ee7c3a286c7865334ecfba7ccc49e21d4e8fb8ec1b17938fda6R618

I noticed that lxc_ssh.py is missing that u-prefix. Maybe not relevant to the current issue, but worth the fix anyway, I assume.

EDIT: I had removed that "u" myself in 1bbdf7910b6a4b3eae50e2df0a1b07d31377dbb2 .. sorry

AnvithLobo commented 3 years ago

I still don't get why you say that both ways work, only one is potentially correct, but yet this should be merged as it is without any confirmation.

For the version checks: I agree that the code for v1 should be removed, it's no longer supported. I tend to disagree to remove >all the checks, just to potentially add this later. But I don't have strong feelings about this. However the tests should have a >version check, and this can be added in the PR before merging it. This does not need another PR just to fix the PR which is >missing that.

ah by both solution works I meant it solves the problem currently at hand. One is dirty fix to fix the current problem at hand without much thought into what other issues might cause in the future and the other Doesn't modify much of the code and also solves the problem. By the looks of it people in #Ansible also think this is the right way to go.

I don't have much experience with Ansible. So i couldn't predict what troubles play_context might cause hence i just suggested it as "it works for now" Fix.

About the test yes. But i think it could be improved and 17 seperate checks might be a overkill IMO.

https://github.com/andreasscherbaum/ansible-lxc-ssh/issues/37#issuecomment-864069967 Like this user suggests in case we need version checking for future use we would probably need to do that after the __init__

andreasscherbaum commented 3 years ago

So is this PR ready to be merged in? It evolved quite a bit, and in the end it's not very long now.

@stefangweichinger?

stefangweichinger commented 3 years ago

@andreasscherbaum

I have to look through all this once again, I don't have the full overview anymore right now. And I still don't fully understand some statements from others.

IMO it fixes the issue, keeps in the version check (which is nice to have just to be safe), and uses get_option and not that problematic play context. So to me it looks good. Although I am not a competent python coder at all.

I don't fully understand the tests and if there are false positives etc ... If there are, I think there should be a new and separate PR for that.

andreasscherbaum commented 3 years ago

@stefangweichinger Ok, then I wait for your feedback.

stefangweichinger commented 3 years ago

Why me? ;-)

My branch https://github.com/stefangweichinger/ansible-lxc-ssh/commits/alltogether was a try to come up with a combination of more or less all the suggested PRs at that time. IMO it should contain a working py-script.

I used it in a local test setup and it was OK for me. I don't know about other issues. I asked a customer to use it in a rollout this week.

andreasscherbaum commented 3 years ago

I'm as confused as you because of the many PRs and changes.

stefangweichinger commented 3 years ago

@andreasscherbaum jitsi in the next days?

andreasscherbaum commented 3 years ago

@stefangweichinger Yes, please!

stefangweichinger commented 3 years ago

@andreasscherbaum write to me at office @ oops co at ... maybe tmrw

andreasscherbaum commented 3 years ago

@dmp1ce @AnvithLobo We plan to have a Jitsi call early next week, and discuss which PRs to merge, in which order. If you plan to join, please drop me an email at @ la. We can discuss times.

dmp1ce commented 3 years ago

I can probably be available any day between 9am - 4pm EST. I couldn't find your email @andreasscherbaum

I am just going to suggest #40 should be merged first, because it implements valid tests. The current tests are showing false positives!

Next I will recommend #41 because it is the most simple change to get tests passing.

andreasscherbaum commented 3 years ago

@dmp1ce My email address is: my first name @ my last name dot la

And I see what GitHub did with the < and > in the previous text...

andreasscherbaum commented 3 years ago

Scheduled a meeting for Monday, 13:00 UTC (15:00 CEST). Will post the Jitsi link here shortly before the meeting. The people where I know the email address got a calendar invite.

andreasscherbaum commented 3 years ago

The meeting link is: https://meet.jit.si/lxc-ssh-merging

andreasscherbaum commented 3 years ago

Course of action (from our discussion):

Merge #40

Re-test #38, then merge it if it works Question: does #40 work with 38, and vice versa

Remove #41, #42

Re-check #39

andreasscherbaum commented 3 years ago

Ah, doesn't work as expected, as re-running the tests will use the same GITHUB_SHA and GITHUB_REF.

@stefangweichinger Can you please pull the changes from #40 into this branch, rebase, and then push this again? This should trigger the tests with the new code from #40. Thanks!

stefangweichinger commented 3 years ago

@andreasscherbaum I cherry-picked 94d2d0c628306de52cf159d5edb335064ebda891 from #40 ... / should work, right?

andreasscherbaum commented 3 years ago

@stefangweichinger If you rebase to HEAD, it should work. The previous commits are the badges you added.

stefangweichinger commented 3 years ago

@andreasscherbaum "git rebase HEAD"; git push origin master" (in my branch "remote_user") doesn't change anything anymore. So I think above state is it .. unless I misunderstand you.

andreasscherbaum commented 3 years ago

@stefangweichinger Looks good, I think.

@dmp1ce Can you please look into the GH Actions tests and see if that is what you expect from your tests? Thanks!

dmp1ce commented 3 years ago

Compare the test for this PR with the tests from the latest commit.

The tests in the latest commit fail at the "Run test" step with ":

 TASK [Gathering Facts] *********************************************************
fatal: [testhost]: FAILED! => {"msg": "Failed to connect to the host via ssh: runner@10.242.67.247: Permission denied (publickey,password).\r\n"}

In this PR run test passes!

stefangweichinger commented 3 years ago

In this PR run test passes!

Isn't that ... good?

dmp1ce commented 3 years ago

In this PR run test passes!

Isn't that ... good?

Yes! All good. This is the result I expected too. 🙂

stefangweichinger commented 3 years ago

Yes! All good. This is the result I expected too.

great. Sounds as if our plan works.

andreasscherbaum commented 3 years ago

@dmp1ce Which means I can go ahead and merge this PR here?

dmp1ce commented 3 years ago

Yes, it is fine. I had to double check locally because Github shows a much bigger diff than there actually is.

This is the actual diff from this PR.

diff --git a/lxc_ssh.py b/lxc_ssh.py
index 37a062f..04684e7 100644
--- a/lxc_ssh.py
+++ b/lxc_ssh.py
@@ -517,8 +517,8 @@ class Connection(ConnectionBase):
         self.user = self._play_context.remote_user
         self.control_path = None
         self.control_path_dir = None
-        self.lxc_version = None

+    def _set_version(self):
         # LXC v1 uses 'lxc-info', 'lxc-attach' and so on
         # LXC v2 uses just 'lxc'
         (returncode2, stdout2, stderr2) = self._exec_command("which lxc", None, False)
@@ -535,6 +535,10 @@ class Connection(ConnectionBase):
             raise AnsibleConnectionFailure("Cannot identify LXC version")
             sys.exit(1)

+    def set_options(self, *args, **kwargs):
+        super(Connection, self).set_options(*args, **kwargs)
+        self._set_version()
+
     # The connection is created by running ssh/scp/sftp from the exec_command,
     # put_file, and fetch_file methods, so we don't need to do any connection
     # management here.
@@ -690,7 +694,7 @@ class Connection(ConnectionBase):
                 to_bytes(a, errors="surrogate_or_strict")
                 for a in self._split_ssh_args(ssh_args)
             ]
-            self._add_args(b_command, b_args, "ansible.cfg set ssh_args")
+            self._add_args(b_command, b_args, u"ansible.cfg set ssh_args")

         # Now we add various arguments that have their own specific settings
         # defined in docs above.
@@ -750,6 +754,7 @@ class Connection(ConnectionBase):
             )

         self.user = self.get_option("remote_user")
+
         if self.user:
             self._add_args(
                 b_command,
stefangweichinger commented 3 years ago

:tada: