Salt Managment Inter VM Communication

nrgaway commented 8 years ago

On 25 December 2015 at 07:40, Marek Marczykowski-Górecki marmarek@invisiblethingslab.com wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Fri, Dec 25, 2015 at 03:58:47AM -0800, Jason M wrote:
> BTW, going to play around with bombshell next week :)

That would be probably the easiest thing to do, but not sure if the most
secure.

I totally agree. I was thinking of bombshell for vm to vm communication where there could be a SaltVM for user formulas where the SaltVM would have no network access or communication to dom0. That still may be risky though, but I wanted to at least experiment with a couple of different ideas to at least eliminate some of them.

I think the way to go, would be to package states/formulas/etc
from dom0, send them to the VM using _simple_ mechanism
(qvm-copy-to-vm?), then execute states there. And the most important
part: do not parse anything coming from that VM back. This means for
example do not render the output. Maybe only exit code
("success"/"failure") and point the user where to look for the logs.

I totally agree. What about dom0 sending configurations to a SaltVM master, which could communicate with TemplateVM's? Or would you rather not introducing a SaltVM in case it gets infected.

As for the state execution, I see state.pkg:
https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.state.html#salt.modules.state.pkg

But I can't find an easy way to create such package (other than digging
deep into salt-ssh...). Any ideas?

I have not considered salt.modules.state.pkg, but will look into it as well. I was thinking of using either a modified salt-ssh or creating a simple client on the TemplateVM. The salt-master can reside in dom0 that will create specific states based on pillar configuration and tops, etc. Based on your comment, I will look into the best way of creating the required data and sending it to TemplateVM without any parsing. So, the salt-master will create some type of state package, upload it only, the templateVM will unpack and execute it and can notify Salt Master when completed with a result code.

Another thing we can consider is using git to receive formulas from web which the TemplateVM can download and install which means we don't need to package it for each distribution. The formulas can be downloaded as a git repo or salt package (one file). Both methods have no existing security other than checksums, but we could verify git repos in a similar fashion to what qubes-builder does, but adding requirement and verification on each commit as well as tag and possibly a checksum on each file and overall?

I have copied this reply to qubes-issues to allow further discussions.

marmarek commented 8 years ago

On Fri, Dec 25, 2015 at 08:44:01AM -0800, nrgaway wrote:

I totally agree. What about dom0 sending configurations to a SaltVM master, which could communicate with TemplateVM's? Or would you rather not introducing a SaltVM in case it gets infected.

If SaltVM (which is generally a good idea) would have power to control (some) VMs, it also should be protected. So process as little as possible untrusted data.

I have not considered salt.modules.state.pkg, but will look into it as well. I was thinking of using either a modified salt-ssh or creating a simple client on the TemplateVM. The salt-master can reside in dom0 that will create specific states based on pillar configuration and tops, etc. Based on your comment, I will look into the best way of creating the required data and sending it to TemplateVM without any parsing. So, the salt-master will create some type of state package, upload it only, the templateVM will unpack and execute it and can notify Salt Master when completed with a result code.

Yes, this is exactly the idea.

Another thing we can consider is using git to receive formulas from web which the TemplateVM can download and install which means we don't need to package it for each distribution. The formulas can be downloaded as a git repo or salt package (one file). Both methods have no existing security other than checksums, but we could verify git repos in a similar fashion to what qubes-builder does, but adding requirement and verification on each commit as well as tag and possibly a checksum on each file and overall?

Not sure how proceed with this. But I think for the first iteration we can skip solving this problem - leaving formula installation (in SaltVM or dom0) up to the user. Then use some mechanism (as discussed above) to transfer them to the target (Template)VM.

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

nrgaway commented 8 years ago

Okay, thanks for the feedback. I won't dive too deep into this until I can present a recommendation or two for further analysis based on some of my research and experiments I will be performing over the next couple of weeks.

My main goal will be as requested; to process the minimum amount of untrusted data.

Rudd-O commented 8 years ago

I recommend not running the bombshell client in dom0. It's possible, but I would advise running it on a non-networked, domU AppVM instead. You can then use nice editors and whatnot to manage your formulas and playbooks, without contaminating dom0 with extra software.

Always treat your VM as you would treat any VM that is going to be running qubes.VMShell commands, because that is what bombshell is -- a proper stdin/stdout/stderr proxy that uses qubes.VMShell to provide for batch and interactive sessions across VMs (and even across computers, with the right by-hand network setup).

Rudd-O commented 8 years ago

Note that, when used in conjunction with the Qubes Ansible plugin, or as a salt-ssh backend, bombshell-client is almost certainly going to be a return transport for JSON or YAML data from the modules.

While it's conceivable that a malicious VM could fudge the Ansible / Salt modules copied into the VM for execution, such that those malicious'd modules would return bogus data:

reading and parsing that data back is unlikely to result in execution of anything at the calling VM, unless there's code in Ansible or Salt that will execute Python when fed JSON / YAML (not really the case),
nor are terminal codes spat out to the running terminal when executing an Ansible or Salt call, so your ANSI terminal should be safe.

Key point: at no point in the use of Ansible or salt-ssh is there any way for the target hosts to call back into the manager host.

https://github.com/ansible/ansible/tree/3b16ad973cf4db0f1790d98e821644df16ea77ca/lib/ansible/parsing/yaml

marmarek commented 8 years ago

On Fri, Dec 25, 2015 at 08:51:34PM -0800, Rudd-O wrote:

Note that, when used in conjunction with the Qubes Ansible plugin, or as a salt-ssh backend, bombshell-client is almost certainly going to be a return transport for JSON or YAML data from the modules.

While it's conceivable that a malicious VM could fudge the Ansible / Salt modules copied into the VM for execution, such that those malicious'd modules would return bogus data:

reading and parsing that data back is unlikely to result in execution of anything at the calling VM, unless there's code in Ansible or Salt that will execute Python when fed JSON / YAML (not really the case),

nor are terminal codes spat out to the running terminal when executing an Ansible or Salt call, so your ANSI terminal should be safe.

https://github.com/ansible/ansible/tree/3b16ad973cf4db0f1790d98e821644df16ea77ca/lib/ansible/parsing/yaml

That's true in theory, but we still want to be on a safe side. What is user will install some custom renderer? Or there will be (or is) a bug in the current renderer? I don't think it is totally impossible (but granted - unlikely). We are trying to have as small attack surface as possible - parsing output from a VM (even with very simple code) enlarges it.

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

Rudd-O commented 8 years ago

At least Ansible explicitly removes all nonprintable characters before passing results to the renderer.

Reducing the risk is why I recommend running the bombshell and its Ansible / Salt connection modules in a non-networked domU.

nrgaway commented 8 years ago

On 26 December 2015 at 09:03, Rudd-O notifications@github.com wrote:

At least Ansible explicitly removes all nonprintable characters before passing results to the renderer.

Would it be overkill to use a DisposibleVM, maybe a special type of DisposibleVM, that either dom0 or a SaltManagementVM would send a master configuration to for one VM at a time. Basically, the /srv salt directory would be transfered to the DisposibleVM. Then SaltManagementVM triggers the DisposibleVM to contact one AppVM to update it.

In this senerio, the SaltManagementVM stay safe since it never needs to parse anything. The TemplateVM's remain safe since the DisposibleVM is starting from a known state and only ever updates one TemplateVM per run.

The DisposibleVM can then parse output from updating TemplateVM and save the logs. If TemplateVM were to infect DisposibleVM, it would not matter since it would not infect SaltManagementVM or other VMs.

Using a process like this means we would not have to write too much custom code to ensure SaltManagementVM remains safe

Any thoughts?

marmarek commented 8 years ago

On Sat, Dec 26, 2015 at 03:42:22PM -0800, nrgaway wrote:

On 26 December 2015 at 09:03, Rudd-O notifications@github.com wrote:

At least Ansible explicitly removes all nonprintable characters before passing results to the renderer.

Would it be overkill to use a DisposibleVM, maybe a special type of DisposibleVM, that either dom0 or a SaltManagementVM would send a master configuration to for one VM at a time. Basically, the /srv salt directory would be transfered to the DisposibleVM. Then SaltManagementVM triggers the DisposibleVM to contact one AppVM to update it.

In this senerio, the SaltManagementVM stay safe since it never needs to parse anything. The TemplateVM's remain safe since the DisposibleVM is starting from a known state and only ever updates one TemplateVM per run.

The DisposibleVM can then parse output from updating TemplateVM and save the logs. If TemplateVM were to infect DisposibleVM, it would not matter since it would not infect SaltManagementVM or other VMs.

Using a process like this means we would not have to write too much custom code to ensure SaltManagementVM remains safe

Any thoughts?

That's a good idea in terms of security. Some technical detail would be to properly set qrexec policies, so that DisposableVM really have access only to that one selected TemplateVM (so even if compromised by one TemplateVM, it would not have access to another). Also that DisposableVM should be carefully selected - probably based on the same template as the target VM. But generally it is a good idea.

It isn't optimal for performance or resources, but if it would be much easier to implement than the alternative discussed previously, IMO it would be good enough.

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

Rudd-O commented 8 years ago

I think the dispvm is overkill, frankly. A regular VM with no network access should be more than okay.

the ansible and salt connection plugins fill a great niche that the qubes salt system cannot by itself fill -- the actual setup and configuration of the contents of the VMs -- because it would be too risky to have dom0 reach into the VMs and run commands, then parse the outputs.

marmarek commented 8 years ago

I've created major hack for salt-ssh qrexec transport (something like bombshell): https://gist.github.com/marmarek/bf01ac51bb53cb767297

Differences vs bombshell:

uses qvm-copy-to-vm for "scp" service
does not pass output from the VM back to salt-ssh, instead construct some dummy data

Usage - in some "MgmtVM" (I've tested it from a VM, but it's trivial to use the same from dom0):

Place as ssh somewhere in $PATH, then create a symlink scp->ssh.
Add target VM to /etc/salt/roster, for example like this:

d8test: 
   # target VM name
   host: d8test
   sudo: True

It is PoC only, do not use for anything else than testing. It is also ugly, sorry. There is also extensive logging to /tmp - there will be a lot of files.

General salt-ssh workflow:

Create an archive with required state files (salt_state.tgz), send it through scp
Connect through ssh (replaced by my script here)
Send a (shell one-liner) shim, which search for python executable, then execute salt-call wrapper.
That wrapper outputs a delimiter string - everything after that is interpreted by salt-ssh (calling site)
If wrapper doesn't find salt thin agent installed, it prints deploy and exit with code 11 - in such case, salt-ssh uses scp to send thin agent (salt-thin.tgz file) and retry
If wrapper doesn't find required external module(s), it prints ext_mods and exit with code 13 - salt-ssh will send required files and retry
In any other case it simply execute requested command (which may imply executing some state package, sent using scp)
Thin agent prints command output - JSON formatted data.
salt-ssh loads that data and display a summary

Conclusions:

it is (mostly) possible to use salt that way, it transfer all required files (states, pillars, modules etc)
it works without any prior provisioning in the VM - just bare qubes.FileCopy and qubes.VMShell services are enough
thanks to exit code, I can artificially output appropriate request (deploy, ext_mods)
limitations of not having feedback channel:
- no feedback whether state succeeded or failed (that could be probably carried through exit code),
- no details what was done in the target VM (salt normally provide information what was changed, and what was already in place) - this can be probably logged inside the target VM,
- and the most important: not possible to use grains from the target VM

The last one is particularly important, because a lot of states depends on it, at least to have os_family grain (distribution name), to adjust paths, etc. For example pkg.installed state (installation of some package) doesn't work without this grain. In my PoC I've hardcoded it to Debian.

I see two options here:

Keep strict requirement to not parse anything returned from a VM, and add some heuristic for a few required grains (at least os_family - for example based on template name...)
Add a simple filtering (whitelist approach) for grains and allow a few required values. If going that way, I would use python json module (hopefully it is secure enough), then extract few white-listed values. If not using full json parser, then it would be some pattern matching script - IMHO much more error prone.

I'm slightly for the second option, although it has slightly bigger attack surface. Example full input for such parser: https://gist.github.com/marmarek/4f4fa732a02ff3073ee4 Example white-listed output: https://gist.github.com/marmarek/3741e5551d33ba9aaf82

@rootkovska any strong opinion here?

marmarek commented 8 years ago

BTW @Rudd-O I was able to use qrexec-client-vm directly, without need for any serializer like yout bombshell. Probably signals are not carried through, but it doesn't matter for Salt, fortunately.

Rudd-O commented 8 years ago

The serializer is necessary to multiplex stdout and stderr in the same channel. If you don't care about stderr (I do), you don't need bombshell.

I fail to see what the problem is with parsing the returned data. It's JSON. If you don't trust what salt-ssh does with the returned data, because you feel that the returned data can be leveraged into dom0 execution (I don't see how based on your own gists) -- then you can write your own custom Salt runner which takes the data returned by the VMs verbatim and raw, then ditches all of them to a file or something. How to do this is well-documented by the SaltStack folks. Needless to say, the returned data is fundamental to using Salt state modules and their highstates.

Also grains break with the mechanism you outlined here. :-(

Note that I'm not using bombshell in dom0 at all. I use bombshell from a non-networked trusted VM, and I point it at other VMs and to dom0 (to configure hardware settings).

ag4ve commented 8 years ago

On Feb 7, 2016 11:18 PM, "Rudd-O" notifications@github.com wrote:

The serializer is necessary to multiplex stdout and stderr in the same channel. If you don't care about stderr (I do), you don't need bombshell.

I fail to see what the problem is with parsing the returned data. It's JSON.

You show me a deserialization library, I'm about 90% sure I can find you a CVE (or a commit directly related to someone else's). Do I agree something like this needs to be implemented - yes. Do I agree being flip about the risks helps anyone - nope.

We all agree an issue here is small. But I posit that with millions of eyes on JS it still took years to find JSONP. You find the same user base/time with salt and (partially due to much less complexity) and I think we'll all be ~ok with salt being a first class citizen in Qubes.

marmarek commented 8 years ago

On Sun, Feb 07, 2016 at 08:18:38PM -0800, Rudd-O wrote:

The serializer is necessary to multiplex stdout and stderr in the same channel. If you don't care about stderr (I do), you don't need bombshell.

Raw qrexec can pass stderr as a separate stream. In case of RPC services, it is redirected to syslog, but when using it from dom0, you can use raw qrexec, so you'd get stderr too. Not very helpful in your use case.

Anyway, I don't care about stderr here (as long as it is available somewhere, for debugging purposes only). Ideally I would not care about having stdout too...

I fail to see what the problem is with parsing the returned data. It's JSON. If you don't trust what salt-ssh does with the returned data, because you feel that the returned data can be leveraged into dom0 execution (I don't see how based on your own gists)

We don't want to think about it, that's the point. If salt-ssh does not receive any untrusted input, we can be sure that it won't be exploited that way. Otherwise, we'd need to analyze that risk. And at least add salt-ssh to the TCB (a software where one bug can break the whole system).

Some example attack vectors would be:

json parser (rather unlikely)
some state template would do something unexpected when get weird string in grains or other returned data; code execution is also unlikely, but some other errors are possible - like path traversal (which could lead to stealing arbitrary private.img for example)
some bug in outputter
...

-- then you can write your own custom Salt runner which takes the data returned by the VMs verbatim and raw, then ditches all of them to a file or something. How to do this is well-documented by the SaltStack folks.

I'm not sure if the "runner" is what we need here. It looks like it isn't about communication channel, but just running some commands on salt master. But I may be wrong. It looks to me that in terms of communication layer there are three options:

salt native protocol - highly interactive, so having it without feedback channel would be at least very hard if possible at all
salt-ssh - described above
proxy minions - not running minion at the remote side at all, only executing selected operations (using whatever mechanism, like SNMP) from some other minion - much less flexible than other methods

Needless to say, the returned data is fundamental to using Salt state modules and their highstates.

Fortunately it isn't true, as I've outlined above. salt-ssh collect all the needed files, creates a state package (tgz) and execute it on the minion side using state.pkg function. The only problem is that states (and pillars etc) are rendered (jinja2 templates are evaluated) on "master" side and that may need access to some grains. In examples I've tried, at least 'os' and 'os_family'.

Also grains break with the mechanism you outlined here. :-(

Yes, this is the problem we need to solve somehow.

Note that I'm not using bombshell in dom0 at all. I use bombshell from a non-networked trusted VM, and I point it at other VMs and to dom0 (to configure hardware settings).

If that "non-networked trusted VM" has ability to run any command in any VM including dom0, it doesn't matter it isn't dom0. It have exactly the same power as dom0.

As said before, in the feature we want to add an API to allow having such management VM, but with somehow limited. To not give full control over the whole system (and user data) to such VM. For example allow only to manage VMs created by that management VM. And obviously do not allow direct dom0 command execution.

Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

rootkovska commented 8 years ago

I would think that - ultimately - we are only interested in Succeed/Fail response from the VM machinery, so a single 0 or 1 bit, no? I really, really don't like parsing of all these stuff you linked to :(

marmarek commented 8 years ago

The point is salt needs some information about the target VM to send right states. At least target distribution to select appropriate modules (for example whether to use apt-get or yum). If that information isn't going to be retrieved from a VM, then user will need to provide it manually, somehow undermine the whole purpose of having management stack...

If you don't like parsing that salt-minion output directly, we can think of what informations is really needed, extract them from a VM using for example separate qrexec service, using some Simple Representation (TM), then provide it to salt the way I've described (my ugly PoC shell script does that - but instead of extracting that info, use hardcoded values in there).

rootkovska commented 8 years ago

How about creating a DispVM, with a copy of all the info from Dom0, and then let the salt in the target VM talk to the DispVM only? Then destroy the DispVM (after retrieving 0 or 1 from it)?

marmarek commented 8 years ago

That DispVM would have control over everything that salt may control, right? Or at least will have all the salt configuration - which can for example contain VPN credentials the user want to configure in some selected ProxyVM. I don't see a way how to do that, to not:

reimplement "everything", or
make that DispVM compromise (almost?) as fatal as dom0 compromise

I think the information needed to extract from a VM (using whatever mechanism) is a really small set. In my PoC it is:

            "grains": {
                "os": "Linux",
                "os_family": "Debian"
            },

I guess in practice, maybe two or three more values are needed, but that would be all.

rootkovska commented 8 years ago

Ok, if there is no easy way to create a subset with all the info for the specific VM, which I understand is not trivial on Salt, then perhaps just have the parser of the return channel in the DispVM? So, the DispVM in this picture doesn't hold any config, only is tasked with simplification of the output returned from the VM.

marmarek commented 8 years ago

It may be an option. But I think it can be also done in the managed VM itself - simply put salt-minion output through a "converter" there and return to dom0 simplified version. Dom0 would need to validate that simplified output anyway regardless it comes from the target VM or DispVM, right?

rootkovska commented 8 years ago

Yeah, right. So the point is not to run json parser and the like in Dom0. Perhaps return some very simplified strings and reconstruct JSON in Dom0 from it?

marmarek commented 8 years ago

Exactly.

marmarek commented 8 years ago

I think also about extracting the info using other mechanism - simplar to qubes.NotityTools (used by Windows Tools for now, but planning to extend also for Linux - #1637). It already covers remote OS info. That way, hopefully, we could construct the most important grains without using that JSON output in any way (not even in VM, simply redirect to /dev/null there). Will look into it later.

Rudd-O commented 8 years ago

salt-ssh is a runner. It's a custom runner. Another one could be built, conceivably. Salt-VM would be the correct name.

The problem of not having stderr in inter-VM communications was bridged by bombshell multiplexing the two channels. If it was possible to have stderr not just from dom0 to domU, then bombshell would have never been necessary.

All salt-ssh does with grains is store them. The values of grains are interpolated in user templates, though, which means that it's possible templates that use grains could be malformed if the grains have crafted data, but since all salt does with the templates is ship them right into the "minion" (node) that supplied the grains, all the "minion" can do is fuck itself up. Naturally, this theory ought to be tested -- it should be trivial to do so.

You fundamentally can't have anything like salt-ssh without receiving untrusted input from VMs to process and react upon the results of your state changes. The question is not whether you can avoid it -- the question is whether you need to either audit the code and try to exploit it, or you need to write some replacement code and hope that your code is less likely to be subverted than salt's existing code. Given that salt has been working in a master-minion configuration for years and no exploits have been found that minions can use with returndata to compromise the master, I'd say that's at least worth an audit just to be sure.

Rudd-O commented 8 years ago

Also, yes, I am aware that my non-networked trusted VM has ultimate power over dom0. It has root. That's why I have consciously decided to trust it.

I accepted that risk because (a) the tradeoff in manageability that comes with being able to run configuration recipes on my VMs and dom0 is immense (I don't have time to do bullshit by hand, to be honest), and because (b) I don't want to burden dom0 with software installs.

Rudd-O commented 8 years ago

Finally, if you're going to return simplified data and avoid JSON altogether, then you most definitely want to construct a custom runner that takes that data and "reconstructs" the JSON.

I don't really see the point though -- what are you going to strip / reconstruct, if you're not going to end up with JSON in the end? The results dictionary? That's brain dead simple, and even if it was malformed, all that happens in the dom0 is an exception (properly handled and caught) is raised. Yes, I know that JSON parsers in the past have had problems, but modern JSON parsers in Python do not instantiate executable code out of the structures that were loaded with json.loads(). That leaves the grains-contain-malicious-data-interpolated-in-templates issue, but we know that Salt renders templates locally and then farms them out for execution on the minion, which means that even the most malicious of malicious grains cannot compromise the master (dom0).

I would, however, love to be proven wrong. All that's needed is crafted grains input from a test VM, that manages to run arbitrary code in dom0. If that's possible, that shouldn't be hard to do. A runtime patch on json.load and json.loads can show tracebacks in all the dom0 code that ultimately calls upon those load operations.

marmarek commented 8 years ago

Naturally, this theory ought to be tested.

No, we don't want to trust behavior of some particular version of salt. If salt-ssh in dom0 doesn't need to parse them in any way, it shouldn't receive them in the first place. To be sure it will not be parsed. This is the whole point of having Qubes OS: trust in (for not having bugs) as little software as possible. Otherwise we could simply shutdown the whole project and switch to Windows...

And I think your theory is wrong - when ran my PoC with empty grains, salt-ssh complains about a lot of modules load failure (in __virtual__ function, if that matters for you). That was before sending any state files to the VM.

You fundamentally can't have anything like salt-ssh without receiving untrusted input from VMs to process and react upon the results of your state changes.

This is clearly wrong, given my PoC does work. All the output from VM is sent to /dev/null there. The only information assumed is distribution name (Debian).

Generally salt-ssh works totally different than standard Salt master <-> minion setup. There is a lot of code in salt-ssh dedicated to package all the required files and send them in one go, to execute using state.pkg function there. This was one of main reasons for choosing Salt.

But because most of (all?) this code for packaging states is in salt-ssh module, not some generic place, it would be non-trivial to write own salt-vm without a lot of code duplication.

Rudd-O commented 8 years ago

No, we don't want to trust behavior of some particular version of salt.

But you are already trusting that behavior as updates happen. Unless you're explicitly vetoing the versions that go in, in which case that's not a problem.

If salt-ssh in dom0 doesn't need to parse them in any way, it shouldn't receive them in the first place.

But it needs to parse them somehow. It needs to use the grains so recipes can target specific VMs. It needs to use the returndata to show to the user what the hell changed on the VMs that were modified (otherwise it's nearly impossible to develop or debug recipes). Those are the two main categories for which salt needs to parse JSON in a role similar to salt-ssh (inter-minion events and reactors don't count here).

If I were to run a crippled version of salt-ssh that would, e.g., not show me a diff of the file changes it has made, or how permissions changed on a templatized file as it got updated, or how a service got started / enabled as the recipe instructed the system to do so, then that's a useless product which I decidedly will not use.

This is clearly wrong, given my PoC does work. All the output from VM is sent to /dev/null there. The only information assumed is distribution name (Debian).

Evidently it is technically possible to execute recipes and discard their output. That's not the problem, though. I'll pose the problem as a question:

Why would anyone want to develop recipes for this hypothetical "Qubes Salt" that are undebuggable and untestable because of no meaningful, usable, straightforward output, that differs so much from the upstream product, no one would know how to use it by looking at its upstream documentation?

The answer should be clear as to how such a product would do in the market. Key feature of Salt is that it shows the operator what the hell happened, so that the operator doesn't have to go manually verify that the intended state is the actual state every time.

Incidentally, this is precisely why I run the master side of Salt and Ansible in a VM rather than in dom0 -- I don't have to worry that dom0 will be compromised, and I also don't have to give up the very things that make Salt and Ansible better than stupid shell scripts -- good, meaningful onscreen feedback as to how my playbooks and recipes are doing.

If my automation was limited to showing me 0 or non-0 output, I would do things by hand, or just not use computers anymore.

This is, of course, all separate from the question of whether json.loads can act as exec(cPickle.laods()), which really is the crux of the question (salt outputters escape ANSI sequences for you already, which would be the other vector for attack). If json.loads can't execute code, then that class of vulnerabilities doesn't exist.

All I'm seeing here is the need to audit some code that has already been shipped in dom0, really.

marmarek commented 8 years ago

No, we don't want to trust behavior of some particular version of salt. But you are already trusting that behavior as updates happen.

There is a big difference between:

in newer version some states do not work because of some missing grains or whatever else there was expected
in newer version is it possible to take over the whole system from some VM

It needs to use the returndata to show to the user what the hell changed on the VMs that were modified

It can be simply logged somewhere in the VM. For debugging purposes (if that 0/1 flag show you there was an error). You can tail -f on that file when debugging configuration.

If I were to run a crippled version of salt-ssh that would, e.g., not show me a diff of the file changes it has made, then that's a useless product which I decidedly will not use.

Of course you can do that if you want. You can also configure salt/ansible whatever way you want. But in default Qubes OS installation we want as small attack surface as possible. If management stack would enlarge it significantly (to our discretion), we'd rather not have management stack at all.

I think further discussion on this matter, here is senseless. If you want to continue, switch to mailing list.

Rudd-O commented 8 years ago

There is a big difference between:

This sort of argument can be applied to pretty much every software package shipping in Qubes, Xen, kernel, coreutils, Python, anything used in the Qubes utilities, whatever. Under your rationale, the problem isn't Salt itself -- the problem is that software updates to code that has already been reviewed happen, and those updates can bring security holes.

Philosophically speaking, your argument qualifies as what's called an "isolated demand for rigor". http://slatestarcodex.com/2014/08/14/beware-isolated-demands-for-rigor/

Note that I do really understand this: future Salt updates can introduce problems processing with minion returndata and minion grains. I do not deny this. I'm merely saying that's just a cost of doing business, just as much as it is a cost of doing business when you update libxenvchan, xen-core, or any of the myriad packages that qubes-dom0-* depend (I believe there's an icon converter in the pipeline, and that could totally be exploited if the libraries used to convert icons somehow gained a security vulnerability via updates...).

It can be simply logged somewhere in the VM. For debugging purposes

That's effectively unusable because such a user interface is that much harder to do compared to vanilla Salt. It's actually less usable than writing a dirty shell script, truly. I know this from experience because I developed bombshell doing exactly that kind of "tail across VMs" crap, and it was not pleasant, and it gave me many headaches. Avoiding that sort of terrible user experience is in fact the initial motivation for my writing of bombshell.

I predict this sort of crippled user interface will doom the product to be entirely unused, as people will naturally default to using something else entirely to automate their needs. Even shell scripts! At least with shell scripts piped through qvm-run, the user gets the chance to 2>&1 and see stdout+stderr.

Then, at this point, you might begin to hear about people getting their dom0s exploited by ANSI escape sequences from a compromised VM, because they understandably forgot -- unlike the Salt developers -- to filter them out of the output printed in dom0. After all, no one single developer who wants something automated quickly has the time to reimplement an ANSI escape filter on every one of their shell scripts.

The general rule is: that you can do something doesn't mean people are going to tolerate it.

But in default Qubes OS installation we want as small attack surface as possible. If management stack would enlarge it significantly (to our discretion), we'd rather not have management stack at all.

It's probably a good idea to avoid having any management stack then. Salt is a complex beast as it is right now, you can guarantee that it will become more complex over time, and its direction within the Qubes ecosystem doesn't seem to be defined to be user-friendly or usable like the actual SaltStack product is -- in fact, right now, it seems to be an option only for the hardest of hardcore geeks who do inter-VM tail -f for sport. That relegates it to be an extremely niche tool intended for people unwilling to write automation shell scripts but also unwilling to diagnose what the effects of their automation are.

Given those minimal benefits, it looks like stripping Salt from the Qubes TCB has the undeniable advantage that the TCB would measurably and objectively shrink.

This is all in the context of us all waiting for capability OSes like Genode to take over and provide better security by compartmentation. It would be wise to choose well what we all invest our efforts in.

marmarek commented 8 years ago

Related salt-users topic on salt-ssh data flow: https://groups.google.com/d/msg/salt-users/gDvuc-5v8Hw/dmniH1eQBAAJ

In short:

salt-ssh sends only states targeted to the given host, not all of them (as in traditional ZeroMQ based setup). This includes also additional files referenced through salt://. But not additional modules - in that case (if needed at all) all modules are sent.
States are rendered on master side and "lowstate" data is sent (in contrast to traditional ZeroMQ based setup). This means master really needs grains for more complex states.
It is highly nontrivial to move state rendering to minion side in salt-ssh architecture.

Given the above confirmation, here is what we discussed internally recently:

For every VM managed by Salt (from dom0):
1. Start target VM.
2. Have dom0 to create DispVM.
3. Send all the Salt configuration there.
4. Grant it qubes.VMShell access to that selected VM only
5. Run salt-ssh (over qrexec) from the DispVM, targeting that single VM. Do not filter return channel there - so for example all the grains will be available to salt-ssh during state rendering.
6. Collect output back to dom0 (success/failure flag, optionally logging full output to some file)
7. Destroy DispVM
8. Shutdown target VM (opt-out? only when wasn't running at the start?).
Repeat above for next VM. Multiple instances can be running at the same time, if there is enough memory.

This gives us the following properties:

(+) Full set of Salt states features does work
(+) Exploiting hypothetical bug in salt-ssh doesn't give control over the whole system
(-) Exploiting hypothetical bug in salt-ssh does leak whole Salt configuration, including the ones dedicated to other VMs
(-) Have some major performance impact (DispVM creation, memory overhead)

Configuration leak (which would require salt-ssh exploit in the first place) can be even further limited using additional steps:

Do not store sensitive data (credentials) in Salt configuration at all. Instead store them in vault VM and grant access to some qrexec service for it ("split keepassx"?)
Do not send all the pillars to the DispVM, only those targeted at particular VM. While for states it would be non-trivial, for pillars it should be doable (as long as grains are not used to target them, which is discouraged anyway).

Some implementation details/ideas:

It would be useful to have pillar external module exporting VM properties. For example to target some state to all the templates. Requires more thinking which properties to export that way (all of them? only user defined tags (#865)? a configurable list of them?). Probably worth a separate ticket.
It is easy to check if some VM have any state targeted at it at all (unless grains are used to target states...): salt-call --local --id=VMNAME state.show_lowstate. Note that resulting lowstate can be inaccurate because of missing grains. But should be enough to detect whether call all that machinery (DispVM, start VM at all etc) or not.
Consider implementing the whole dom0 part as Salt runner module - it would make easier (faster) to access above mentioned data. But it may have also some downsides. Just an idea to consider.
Copying full Salt configuration to DispVM should be as simple as copying /srv.

marmarek commented 8 years ago

One unanswered question: what template should be used for that DispVM? Having that template compromised means compromise of every salt-managed VM (probably all the templates or so). Also choosing "DispVM based on the same template as the target VM" is not good, because of configuration leak ("red-crappy-install-every-curl-pipe-bash" template based VM would learn all the Salt configuration, even that targeting "secret-project-xyz" VM). So we need "the most trusted" template here... Maybe a separate one dedicated for that purpose? In that case it would be based on some minimal template.

rootkovska commented 8 years ago

Yeah, I think some kind of minimal template would be good to have anyway (e.g. for the vault service).

Also, FWIW, attaching my original arch sktech from the meeting :) salt-dom0-disp-sketch

marmarek commented 8 years ago

Implemented in marmarek/qubes-mgmt-salt-connector repository. It depends on minor changes in:

qubes-mgmt-salt-base-topd
qubes-mgmt-salt-base-config
qubes-mgmt-salt All pushed to master branch on my account (not yet merged to qubesos). Also depends on #1808

This implementation is based on core2 API. But migration to core3 API should be easy (DispVM creation)

marmarek commented 8 years ago

Todo:

[x] wrap the code into proper classes, install using setuptools
[x] integrate with qubesctl
[x] parallel execution

@woju is it possible (and if so, is it a good idea?) to install additional ("external") subpackages into the same package as core (qubes)? If not, we need some convention on python package naming. Currently we already have qubes, qubesdb, splitgpg, imgconverter. And we need to move out remaining parts from qubes (https://github.com/QubesOS/qubes-mgmt-salt-base-overrides/tree/master/src/qubes/mgmt, probably more)

woju commented 8 years ago

Combining one python package from different distro packages is generally a bad idea. I wouldn't discount it altogether, however the need is very rare. Note that there are similar solutions like drop-in directories for plugins (vide /srv/salt/_modules), those however aren't proper python packages. In case of qubes I strongly advise against.

I also don't think we need a "naming policy" precisely because the names aren't changing anything (this wouldn't be the case if we put those modules in single package). The gain is purely cosmetic. That said, we could name all our packages as qubes*, i.e. qubessplitgpg, qubesimgconverter, and qubesmanager for that matter.

marmarek commented 8 years ago

Finally integrated into qubes-mgmt-salt repository (where qubesctl tool live). Will remove qubes-mgmt-salt-connector repository. There are two required packages to use this feature:

qubes-mgmt-salt-vm-connector package installed in default template
openssh-clients (for scp binary) installed in every managed VM (it isn't used, but salt-ssh checks for its existence)

Then, qubesctl --all state.highstate can be used to apply configuration everywhere. More details in documentation.

QubesOS / qubes-issues

Salt Managment Inter VM Communication #1541