Closed roerohan closed 4 years ago
The input test cases being leaked is actually not a vulnerability, the user's code is always allowed to read input test cases, cause it has to get the input. Whether its a vulnerability that you can read other input cases than the one you are executing against is debatable (I think its out of scope for this library), but I think then each case should be executed by re running the same container and mounting a different volume for each case. (And the executor can actually run against multiple test cases in parallel using this approach. The current approach with sequentially checking each test case is terrible anyway)
The real issues are:
docker run
with the same params. This should be enough to prevent fork bombs, or any other high CPU using processes to lag the host. (These params will need to be played with, we don't want to limit CPU so much that we end up increasing execution time) (https://docs.docker.com/config/containers/resource_constraints/)The input test cases being leaked is actually not a vulnerability, the user's code is always allowed to read input test cases, cause it has to get the input.
@thebongy What I meant was: if you're running code for a sample test case, you should not be able to read the hidden test cases.
As you mentioned, this can probably be resolved by re-running the same container by mounting a single test case at a time so that you can't leak other test cases.
However, there is still the issue of being able to fork bomb
or spawn a reverse shell
which needs to be fixed, which can possibly be fixed through what you mentioned.
The real issues are:
- Why are output cases written to files and mounted inside the container? The output checking seems to be happening outside the container anyway, having just the input files is enough. The user being able to read outputs of test cases, is a vulnerability.
- We have to block outbound connections from the container, that's where the real leak can happen. And this is straightforward, just need to add another boolean parameter on create container , to change the network mode to internal (https://docs.docker.com/engine/reference/commandline/network_create/#network-internal-mode)
- Lastly, the resource constraints (CPU/MEM) is something maybe that can be passed to code executor as a param, and it has to pass --cpus and --memory while executing
docker run
with the same params. This should be enough to prevent fork bombs, or any other high CPU using processes to lag the host. (These params will need to be played with, we don't want to limit CPU so much that we end up increasing execution time) (https://docs.docker.com/config/containers/resource_constraints/)
Hey @roerohan, I'll take up a part of this for now. I will remove the code where it writes the expected output in the container ( it is not needed anyway, I must have forgotten to remove it ). I will also implement running different test cases in different containers so the user won't be able to leak other input cases.
I think then each case should be executed by re running the same container and mounting a different volume for each case. (And the executor can actually run against multiple test cases in parallel using this approach.
@thebongy I didn't quite get this, If we are to run all the test cases parallelly in the same container, all the test cases needs to be present in the container. Then again, the user will be able to leak the other input test cases. Did you mean running those test cases in different containers parallelly?
@alias-rahil I think that's what he meant. You mount a single case in a container and run the containers parallelly.
Anyway, I'll assign this issue to you, for now.
@thebongy I didn't quite get this, If we are to run all the test cases parallelly in the same container, all the test cases needs to be present in the container. Then again, the user will be able to leak the other input test cases. Did you mean running those test cases in different containers parallelly?
@alias-rahil
Yeah, multiple containers. You can consider using docker run
directly instead of a createContainer followed by a start(), that reduces the overhead of talking with the docker api from 2 calls/ test case (create+start), to just 1 call (run) per test case
Yeah, multiple containers. You can consider using
docker run
directly instead of a createContainer followed by a start(), that reduces the overhead of talking with the docker api from 2 calls/ test case (create+start), to just 1 call (run) per test case
Cool, will do this.
Test Cases may be leaked through RCE
Since the
input
is stored as a file inside the docker image, there is a possibility that someone could leak the test cases. For example, they could do the following in Python:This can be fixed by modifying the permissions inside the Docker containers. An alternative could be to take input through
stdin
. This is a critical bug and needs to be fixed.However, a user might still be able to spawn a reverse shell inside the container or run a fork bomb. A feasible way to prevent this would be to block all outgoing network traffic and using something like nsjail inside the containers. This too is a security risk of critical severity.