aws-deepracer-community / deepracer-for-cloud

Creates an AWS DeepRacing training environment which can be deployed in the cloud, or locally on Ubuntu Linux, Windows or Mac.
MIT No Attribution
325 stars 175 forks source link

error using DR_DOCKER_STYLE:compose for local training #123

Open 1-ashraful-islam opened 1 year ago

1-ashraful-islam commented 1 year ago

If the system.env file is modified to have DR_DOCKER_STYLE:compose , then running source bin/activate.sh results in error when using local training. It seems that docker tries to spin multiple minio containers and results in port in use error.

Should it be changed to docker stack deploy or is there any other setting that needs to be changed?

larsll commented 1 year ago

The workaround (and default) is that DR_DOCKER_STYLE=swarm; this uses the stack deploy etc.

Changing to compose mode actually has some implications, and the current init.sh does not handle well, and that must be updated.

larsll commented 1 year ago

Created #126 as a way to better manage the configuration using the init.sh.

When running in a clean container - using init.sh with -s compose I now get a working system. I would need more information about what goes wrong in your setting.

1-ashraful-islam commented 1 year ago

If I set DR_DOCKER_STYLE=compose and then try to activate the environment with source bin/activate.sh then I get the following error:

Starting s3_minio_1 ... error

ERROR: for s3_minio_1  Cannot start service minio: container c9413dc26b2be8126379dc69ebed5c811f7c1f93a8b8111a1adedb475067bd23: endpoint join on GW Network failed: driver failed programming external connectivity on endpoint gateway_fde258ef812b (050a1ba826f6e6f57c4d2d5b6c66b173db390b426f7b3ee82dfdd2c1ab2d0041): Error starting userland proxy: listen tcp4 0.0.0.0:9001: bind: address already in use

ERROR: for minio  Cannot start service minio: container c9413dc26b2be8126379dc69ebed5c811f7c1f93a8b8111a1adedb475067bd23: endpoint join on GW Network failed: driver failed programming external connectivity on endpoint gateway_fde258ef812b (050a1ba826f6e6f57c4d2d5b6c66b173db390b426f7b3ee82dfdd2c1ab2d0041): Error starting userland proxy: listen tcp4 0.0.0.0:9001: bind: address already in use
ERROR: Encountered errors while bringing up the project.

I am using WSL2.

Update: I git pulled the latest repo. I changed to compose in the ENV file. Then run the source command seems to run into same error. The minio image seems to be auto restarting in docker since I switched to systemd in wsl. If I quickly stop the minio container and then run the source command- it seems to execute without error.

1-ashraful-islam commented 1 year ago

Update: After pulling the latest repo and removing the docker service named s3_minio I no longer get errors. The service kept restarting minio container in the background causing previous issue.

But, if I switch back to swarm, the docker service is reinstalled. This causes the issue to reappear if I switch back to compose again (requires manual removal of minio service)

larsll commented 1 year ago

OK - so the 'feature request' here would be to clean up the swarm service s3_minio.

larsll commented 6 months ago

Feature moved into #149