By this I mean that I will run multiple isolated scripts
python script1.py &
python script2.py &
...
wait
Currently when trying to do this, I get an error like below
───────────────────────────────────────────────────────────────────── Entering Experiment llm-math-judge with id: llm-math-judge_1726789456 ──────────────────────────────────────────────────────────────────────
[16:44:16] Launching task nemo-run for experiment llm-math-judge experiment.py:601
[16:44:21] Error running task nemo-run: 409 Client Error for http+docker://localhost/v1.46/containers/create?name=nemo-run-0: Conflict ("Conflict. The container name "/nemo-run-0" is already experiment.py:622
in use by container "7591568f4b184e6134be9b92f4434c06242ca96d86654346854feb627028686a". You have to remove (or rename) that container to be able to reuse that name.")
Traceback (most recent call last): experiment.py:623
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/docker/api/client.py", line 275, in _raise_for_status
response.raise_for_status()
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: http+docker://localhost/v1.46/containers/create?name=nemo-run-0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/nemo_run/run/experiment.py", line 616, in run
job.launch(wait=wait, runner=self._runner)
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/nemo_run/run/job.py", line 340, in launch
handle, status = launch(
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/nemo_run/run/torchx_backend/launcher.py", line 99, in launch
app_handle = runner.run(
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/nemo_run/run/torchx_backend/runner.py", line 87, in run
handle = self.schedule(dryrun_info)
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/nemo_run/run/torchx_backend/runner.py", line 102, in schedule
app_id = sched.schedule(dryrun_info)
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/nemo_run/run/torchx_backend/schedulers/docker.py", line 109, in schedule
req.run(client=client)
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/nemo_run/core/execution/docker.py", line 328, in run
container_details.append(container.run(client=client, id=self.id))
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/nemo_run/core/execution/docker.py", line 269, in run
return client.containers.run(
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/docker/models/containers.py", line 876, in run
container = self.create(image=image, command=command,
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/docker/models/containers.py", line 935, in create
resp = self.client.api.create_container(**create_kwargs)
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/docker/api/container.py", line 440, in create_container
return self.create_container_from_config(config, name, platform)
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/docker/api/container.py", line 457, in create_container_from_config
return self._result(res, True)
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/docker/api/client.py", line 281, in _result
self._raise_for_status(response)
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/docker/api/client.py", line 277, in _raise_for_status
raise create_api_error_from_http_exception(e) from e
File "/home/igitman/anaconda3/envs/base-env/lib/python3.10/site-packages/docker/errors.py", line 39, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation) from e
docker.errors.APIError: 409 Client Error for http+docker://localhost/v1.46/containers/create?name=nemo-run-0: Conflict ("Conflict. The container name "/nemo-run-0" is already in
use by container "7591568f4b184e6134be9b92f4434c06242ca96d86654346854feb627028686a". You have to remove (or rename) that container to be able to reuse that name.")
By this I mean that I will run multiple isolated scripts
Currently when trying to do this, I get an error like below