KhiopsML / khiops

Khiops is an AutoML suite for supervised and unsupervised learning
https://khiops.org
BSD 3-Clause Clear License
26 stars 2 forks source link

Settle on the Fate of the `--allow-run-as-root` OpenMPI Command Option #323

Closed popescu-v closed 1 month ago

popescu-v commented 3 months ago

Description

Currently, the --allow-run-as-root OpenMPI mpiexec option is hard-coded in the khiops-env scripts for native installations of the Khiops binaries on Linux operating systems (Ubuntu, Rocky).

Or, this is not recommended by the OpenMPI maintainers, mostly for security reasons:

The goal of this issue is to settle on the final approach to the usage of this option.

Questions/Ideas

Context

bruno-at-orange commented 3 months ago

why I choose to set --allow-run-as-root int khiops-env: OpenMPI strongly discourage to run mpiexec as root. It is so discouraged that the user gets this error if khiops is run as root:

mpirun has detected an attempt to run as root.

Running as root is strongly discouraged as any mistake (e.g., in defining TMPDIR) or bug can result in catastrophic damage to the OS file system, leaving your system in an unusable state.

We strongly suggest that you run mpirun as a non-root user.

You can override this protection by adding the --allow-run-as-root option to your command line. However, we reiterate our strong advice against doing so - please do so at your own risk.

I'm not sure that any data scientist will understand this message. So I decided to add the flag --allow-run-as-root. When we launch khiops, the goal is that the scripts khiops and khiops-env run everywhere without any problem. If we decide to remove this flag we have to had some documentation about this troubleshooting.

I agree, it is very bad to run applications as root, but IMHO we don't have to manage that at the khiops level. Besides, If we use mpich or intelMPI or if we launch khiops with 1 proc, there's nothing to stop us from launching khiops as root.

And finally: what is the real risk of running khiops as root? (khiops never destroys files or directories it has not created itself)

sgouache commented 3 months ago

Guys, let me clarify what this mysterious "--allow-run-as-root" is actually doing. If the user launching the mpirun command is root, it will propagate this user id to the launched sub-processes. If the user is a normal user, the children will be launched with his user id. Nothing more, nothing less. Demo - I have a folder owned by root that nobody can read except root:

(base) nmms4680@yd-cnd7506qkx:~/workspace/test$ tree . . ├── rootdir [error opening dir] └── userdir

2 directories, 0 files

(base) nmms4680@yd-cnd7506qkx:~/workspace/test$ sudo tree . ├── rootdir │   └── testfile └── userdir

2 directories, 1 file

Now if i do

(base) nmms4680@yd-cnd7506qkx:~/workspace/test$ mpirun -n 1 -q --allow-run-as-root tree . . ├── rootdir [error opening dir] └── userdir

2 directories, 0 files

If I run the same command as root:

(base) nmms4680@yd-cnd7506qkx:~/workspace/test$ sudo mpirun -n 1 -q --allow-run-as-root tree . . ├── rootdir │   └── testfile └── userdir

2 directories, 1 file

CQFD like we say in french.

So for "normal" users e.g. datascientists there is no risk at all with this option. For strange users who like to play around as root, does it make sense for khiops to try to educate them?

sgouache commented 3 months ago

To be clear: it is almost always a bad idea to run anything decently complicated as root. As the openmpi authors stated in https://github.com/open-mpi/ompi/issues/4451#issuecomment-416238568 we can never guarantee that our software is completely bug-free.

In this case the main problem is that we are silently hiding the fact that we are propagating elevated privileges via mpirun, which is not maintained by us.

To put ourselves on the safe side I'd do the following:

The above would protect the user from unknowingly launching the program as root and hide the implementation details (OpenMPI) which the user doesn't really care about.

folmos-at-orange commented 2 months ago

My 2 cents: I think that running with --allow-as-root is taking an avoidable,and potentially unbounded, risk, since as @sgouache said we cannot guarantee that our software is 100% bug free.

It is clear that for UX is not ideal, as the message is cryptic. But, the users that will get this message are in Linux (and potentially within Docker). So I think we can assume that they are advanced ones.

I would simply run the application without --allow-as-root and document in the site that in case of obtaining the message there are two ways of resolve: either changing the mpiexec parameters or set the related env variables. This should suffice for advanced users. This has the advantage of not having anything to maintain. The option of SG of not Khiops running as root with opt-out is ok for me as well, but I would prefer less maintainance.

Finally, as for the argument "but mpich let us do it", not because before we were at risk, doesn't mean we should continue to be. In that sense SG would also lift the risk in that case.

Whatever the case I vouch for not running with --allow-as-root by default.

marcboulle commented 2 months ago

Bilan des options Khiops dédiées MPI

Ne pourrait pas se contenter uniquement de l'option existante KHIOPS_MPI_COMMAND?

Eh oui, Khiops n'est pas 100% bug free, mais on peut le lancer directement sans MPI (cf. coclustering) , avec MPICH, OpenMPI, sur desktop, via docker.... On ne va pas rappeler à chaque fois à l'utilisateur que lancer en root, c'est risqué?

popescu-v commented 2 months ago

I also agree with @sgouache that, ideally, Khiops itself should refrain, by default, from being run as root, and that the effects of this on the concrete MPI backend (or its lack thereof, for sequential runs) should be, by default, transparent to the end-user.

But, meanwhile, here are my 2 cents:

lucaurelien commented 2 months ago

This discussion has brought to light important security and user experience considerations. UX must remain a priority. However, the fact that Khiops propagates its privileges to OpenMPI, a third-party library, amplifies the risks associated with running processes as root. Indeed, bugs or vulnerabilities in a component external to our could have serious consequences for the whole system, impacting our users.

The proposal to introduce a Khiops-specific flag strikes a balance between security and usability. It empowers users to make informed choices acknowledging the associated risks, especially while OpenMPI naturally inherits the same root privileges as Khiops (what we do to prevent cryptic errors difficult for Khiops users to interpret). Plus, this choice can be made without consulting the doc as we would display a clear message if user executes Khiops in root mode.

Therefore, I vote for the following solution:

  1. Remove the default --allow-run-as-root flag from the khiops-env scripts.
  2. Introduce a new Khiops flag (e.g., KHIOPS_ALLOW_RUN_AS_ROOT) that users can explicitly set to enable root execution.
  3. When the KHIOPS_ALLOW_RUN_AS_ROOT flag is set, include the --allow-run-as-root flag in the mpi command.
  4. If Khiops is run as root without the flag, display a clear warning message explaining the potential risks and how to enable root execution using the new flag. The message should emphasize the privilege escalation to OpenMPI and the importance of informed consent.

The kind of questions that helped me make this decision:

popescu-v commented 2 months ago

OK for me, with the small caveat that, instead of adding the --allow-run-as-root option to the MPI command, thus modifying it, I'd rather set the 2 environment variables OMPI_ALLOW_RUN_AS_ROOT=1 and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 if the user sets KHIOPS_ALLOW_RUN_AS_ROOT=true and OpenMPI is used. If MPICH is used instead of OpenMPI (e.g. in Conda environments), then setting KHIOPS_ALLOW_RUN_AS_ROOT would have no effect.

Indeed, setting the 2 OMPI_* environment variables mentioned above would have the advantage that the MPI command itself would not be altered automatically by this configuration. Thus, the user could still fully override the MPI command (as it is now the case) via KHIOPS_MPI_COMMAND independently of running MPI as root or not.

folmos-at-orange commented 2 months ago

I agree with @popescu-v.

Another problem with KHIOPS_ALLOW_RUN_AS_ROOT is that it doesn't scale: If we encounter another security/UX issue we would have to add KHIOPS_ENABLE_PROBLEMATIC_FLAG.

I argue that it suffices to document the solutions

The solution has zero development cost and little documentation cost.

sgouache commented 2 months ago

I have to propose what I believe will offer a more consistent UX:

The benefits of the above solution:

Drawbacks:

lucaurelien commented 1 month ago

I close the discussion. A new related issue #344 is created to implement the changes.