EMS-TU-Ilmenau / chefkoch

A compute cluster cuisine for distributed scientific computing in python
Apache License 2.0
5 stars 1 forks source link

Investigate on security-aware interfacing to shell commands #52

Open ChristophWWagner opened 4 years ago

ChristophWWagner commented 4 years ago

Arbitrary code execution allows the very bad situation that a "digital perpetrator" might intentionally run malicious code in situations that were not intented by the programmer for code execution. We should do our best to avoid that chefkoch offers much opportunity for this.

Some examples, where things like this could be introduced:

While it is sufficient to avoid those functions, or to avoid calling them with arbitrary (user-defineable, also partially) strings in the most cases, sometime one just can't get around having to use them.

In this issue, the investigation shall conclude about

Please document your findings in the discussion to this issue

pegro commented 4 years ago

Could you elaborate what "Import (Interpretation) of YAML files" means?

I mean "Always sanitize user inputs" is a good rule of thumb for developing any software with an user interface, regardless of what then later do with these inputs. Executing malicious code is not the only thread. There could be problems with assembled paths in the filesystem, datastorage interactions (like SQL etc.).... But I guess I'll have to read more on this project since I seem to miss a lot of context.

wiebsS commented 4 years ago

eval

The method to make eval() secure, depends on how we want to use it. If we only want to convert something to a python literal ast.literal_eval seems like a good choice. But it is restricted to Python's literal structures (strings, lists, numbers,...). Numpy seems to have the same method.

I also found an asteval package, which supports functions from the math-module and "a large number of functions from numpy", if we want to evaluate something mathematical.

At last there seems to be a workaround to make eval() secure. It uses a dictionary with allowed functions, but that doesn't seem like the best choice.

wiebsS commented 4 years ago

yaml

To parse yaml-files we should always use yaml.safe_load, because it can't execute arbitrary code from the YAML file. There is also a matching expression for dumping data.

In theory there is also strict yaml, but that wasn't optimized for speed, so it may be too slow.

wiebsS commented 4 years ago

syscall

Making syscalls secure seems to be a more complex problem. There are things like subprocesses, but they also execute the code, so I don't think they really solve the problem.

The best course of action would probably be to isolate our program or at least the function calls in our steps, like a sandboxed Python. There is a Pypy Sandbox, which seems like an older version, and a module for Process Isolation, which could also be a choice. If we only want to use Linux, we could also use a mechanism of the os. nsjail, for example.

ChristophWWagner commented 4 years ago

@pegro To answer the YAML-related question: Since YAML allows to define defaults for sections, it is possible to produce infinite recursions from either unintended or bad-intended user input. The corresponding point only hints that a YAML interpreter should be chosen that can handle such bad behaviour

Further, the YAML Container shall be able to support linking in other YAML files. For example, if you would like to outsource the configuration of one section to another contig file, then it shall be possible that this can be easily redirected: For example, the configuration

configuration:
    some_option: bla
    something_else: blub

may be redirected to another YAML file foobar.yml like this:

configuration: foobar.yml

where foobar.yml in turn contains

some_option: bla
something_else: blub

The syscall topic is separate from the YAML issues, since it is desirable to avoid calling unsanitized shell commands from within the project a various stages. Depending on whether other packages exist, we might need to do that (interfacing to git or tar might be an example for such situations). Whenever we need to resort to some way of command-based tool interfaces, we should be doing our best to sanitize those calls.

It seems to me, that the preferred way of doing this would be to encapsule the actual syscalls and only call this capsule from within the project. Then, we may handle all sanitization aspects there instead of having to deal with it on several occasions throughout the codebase.

Maybe the overall best approach would be to avoid syscall to the largest extend possible

ChristophWWagner commented 4 years ago

Please feel free everyone to suggest other solutions or shift the discussion to more concrete problems.