ansible / proposals

Repository for sharing and tracking progress on enhancement proposals for Ansible.
Creative Commons Zero v1.0 Universal
93 stars 19 forks source link

stackable connection plugins #25

Open bcoca opened 8 years ago

bcoca commented 8 years ago

Proposal: stackable connection plugins

Author: Brian Coca @bcoca

Date: 2016/07/33

A single 'connection' does not always get us to where we want to. This is NOT an attempt to solve ssh proxying, but a more complex ssh + chroot or ssh + docker to access hosts.

Problems

- user: name=insidechroot state=present
  connection: ssh+chroot

Plugins should clearly state target dependencies and required 'connection data'. The 'location' of the dependencies has to be made very clear in the general docs.

mscherer commented 8 years ago

Well, you could make ssh+paramiko work, thus solving the proxy issue :)

Infiniverse commented 7 years ago

Hey @bcoca, has there been any more thoughts on this since it was proposed (~ 1 yr ago)? It appears that there are several attempts by people to solve this, but not generically as proposed.

We've got lots of FreeBSD jails (actually iocages) running remotely, which we would like to manage - but they need an ssh connection to the box first.

bcoca commented 7 years ago

@Infiniverse sadly there doesn't seem a good way around the problems above, mostly for action plugins like 'copy' or 'template', direct actions on the target itself are much easier to deal with.

Infiniverse commented 7 years ago

@bcoca It seems a bit strange, because in principle it ought to be really easy. The connector interface appears simple enough, what it appears to be missing is the ability to run a connector at the far end of the ssh connection - i.e. use the ssh connector to copy the Jail (for instance) connector to the far end and then connect them together. Without this ability it's never going to work properly.

What's ansiballz? (Ah, ok, just read up on it.)

What's the challenge in packining up, sending and executing a connector plugin at the remote end of a connection?

bcoca commented 7 years ago

in part, as the plugins currently depend a on ansible code itself, they are not designed to be independent as modules are, the other big issues are file transfer and error propagation.

Infiniverse commented 7 years ago

Ok, hmm, so until plugins can be transported like modules are there isn't going to be a generic solution to this. How much work would it be to extend the "shippability" to plugins?

Doing a deeper search on this issue, it appears to have been upsetting folk for years and years :) And there doesn't yet appear to be a good consensus as to how to proceed.

Infiniverse commented 7 years ago

The other way to proceed is to layer connections on top of ssh, in much the same way that iocage is layered on top of jail. It might break streamlining, but that might be a price worth paying.

With at least chroot and jail, the remote location for files is about specifying a per host prefix so that the play books can work with locations from the perspective of the root of the chroot/jail. Then it's just a matter of getting the execute context setup correctly before executing.

If we teach the ssh connection how to be a bit more general about how it acts then it ought to be able to wrap it inside of the chroot/jail/iocage/etc connectors.

Have you seen any discussion on this approach anywhere?

bcoca commented 7 years ago

my thought was adding a property to connection plugins stackable=False|True and redesign them to be shippable, that is not a big issue.

Infiniverse commented 7 years ago

On 21 Mar 2017, at 15:54, Brian Coca notifications@github.com wrote: my thought was adding a property to connection plugins stackable=False|True and redesign them to be shippable, that is not a big issue.

Ok - that sounds like a plan too. Do you have time to work on it? I’m happy to help (although python isn’t my primary language, I’m a reasonable competent coder).

Joe

bcoca commented 7 years ago

this is in limbo until we find a good way around the problems I listed above, pushing the plugin is the easy part, handling action plugins/chained file transfers and error propagation in a consistent and resilient way ... that is not as easy.

dw commented 6 years ago

There is an initial connection delegation ("stacking") implementation in Mitogen that covers most of the problems mentioned in this proposal, but introduces one or two of its own:

The workalike connection types require Python to be installed in the target container unlike how (I think) the existing plugins work. For example, the "raw" module wouldn't work to permit installing Python in the container. That's actually avoidable, but it's more work :)

dagwieers commented 6 years ago

@dw Avoidable by moving to Go ? :-)

dw commented 6 years ago

It's hairy but it's doable, and kept nicely buried away out of sight :) https://github.com/dw/mitogen/issues/223

bcoca commented 6 years ago

@dw when you bypass tools like sftp for file transfer you loose a lot of safeguards, retries and verification. Did you reimplement those? Will it work with non ssh transports (winrm)?

dw commented 6 years ago

I did a quick review of scp.c to see what functionality is missing. While Ansible prefers sftp, that's 5500 LOC to read rather than 1140 :)

That's a total of 2 known bigger bugs, and 2 new bugs of about 15 minutes each. In terms of LOC this replaces 1100 lines of memory-unsafe C with 77 lines of easily understood and memory-safe Python server (with 131 comment lines), and 23 lines of client. Finally it's important to note the size of scp.c is due in large part to the requirement for explicit error handling in C. That is done implicitly (and for the most part, perfectly) by Python. While scp.c is an old and extremely well-audited piece of code, I'd be willing to bet a round of drinks that FileService/_get_file() already handles errors that scp fails to catch.

I found https://github.com/ansible/ansible/pull/18643 . The original intention for file transfer was to use Mitogen's fakessh support. It still needs to become battle-ready to support transparent multi-hop synchronize, however the same functionality could also support the option to fall back to scp or sftp just like today, configured using the same ansible.cfg keys as before.

I found https://github.com/ansible/ansible/pull/20187/files . It's not clear if this is handling connection retries or file transfer retries. In either case, the extension has no equivalent retry logic but it could be added easily, and since retry is not an inherent property of scp/sftp, it's doesn't really fit into this comparison.

Searching git log --grep '\bscp\b' --grep '\bsftp\b' doesn't seem to reveal any further interesting bugs that weren't related to coping with Ansible's integration with scp/sftp (e.g. fixing error handling), rather than specific issues being addressed. I'd love more examples if you have any.

For me the real worry with an in-process approach is upper-bound performance, however I was surprised to discover that thanks to SSH single-threaded crypto, the bottleneck (by a factor of ~4x on my laptop) is SSH itself rather than Python.

I have a half-hacked example of something FileService /could/ do that scp will never be able to do: channel bonding. It simply opens duplicate SSH connections and round-robins the queue it decides to send the next chunk on. The result is multi-core crypto, and scp faster than scp, with nothing installed on the target. I'll check it into the examples/ directory sometime next week, but it's a tiny example of the kind of flexibility gained here that was never possible before.

edit: forgot one more, which is timeouts. scp times out when SSH times out if configured. This stuff isn't tested enough yet. That and a bunch more improvements are in https://github.com/dw/mitogen/issues/212

edit: no, it won't work for WinRM, but neither will scp/sftp :)

bcoca commented 6 years ago

the reason we prefer sftp is because it implements many of the safeguards i mention, which probably accounts for the difference in code size.

The fileservice limitation is something Openssh has had for a long time, in environments that needed the throughput I've normally just patched it, just not really important since most of the time our playloads are tiny and we should not hit it for internal use, those that are using Ansible to distribute large files or huge number of them ... well, they already had this issue and it is not the focus of this tool. It would be interesting to have a 'fast file copy' that worked over ssh, but that is a tangent at this point.

My question about other connections is not related to scp/sftp specifically but on being able to handle files across the connections, which is the main reason this proposal stalled.

dw commented 6 years ago

The two smaller bits of scp functionality were added, alongside a safeguard scp lacks: writes proceed to a uniquely named temporary file which can be synced and renamed over the target. It's no longer possible for simultaneous ansible-playbook runs writing the same file to leave a corrupted file, or for the file to become inconsistent if it is being written during a crash.

Simultaneous scp to a file when the contents mismatch can result in any of (depending on when the overlap began):

1) a corrupt file with holes in it (i.e. a sparse file) 2) a correct but otherwise truncated file 3) a corrupt file with data incorporated from both attempts.

And in every case with scp, the partially written file is observable to that file's users during the transfer. I can't begin to fathom how many web site playbooks exist out there where half-uploaded and truncated static assets are served to end users while ansible-playbook is running.

Without knowledge of the specific safeguards sftp has that you're referring to (they don't appear to be spelled out anywhere above), time will be made to review the sftp source and incorporate any additional checks found into FileService.

The opportunity was used to add one more optimization: removal of the final roundtrip for files under 32KB. It's not exposed to Ansible yet (no suitable interface), but small files can be written with 0 roundtrips, assuming a step exists afterwards to gather return values and verify success.

Finally it's worth note that scp.c is absolutely littered with calls to stat() where the return value isn't checked, but control logic proceeds using the uninitialized struct st regardless. These are all failure conditions that Python handles for free that apparently have never been noticed by any scp user.

seacuriti commented 10 months ago

This is also useful for my use case where I want to automate QEMU virtual machines on a remote server to which I connect through SSH. Does anyone know if it's possible to achieve that with a custom Ansible connection plugin?

bcoca commented 10 months ago

not 'stackable', but this is how AAP creates a 'mesh' to distribute jobs remotely, could be used for the case of 'ssh' + container https://github.com/ansible/receptor