aws-deepracer-community / deepracer-for-cloud

Creates an AWS DeepRacing training environment which can be deployed in the cloud, or locally on Ubuntu Linux, Windows or Mac.
MIT No Attribution
325 stars 176 forks source link

Stuck at "DoorMan: installing SIGINT, SIGTERM" #137

Closed ShabirK21 closed 10 months ago

ShabirK21 commented 1 year ago

system.env

DR_CLOUD=local DR_AWS_APP_REGION=us-east-1 DR_UPLOAD_S3_PROFILE=default DR_UPLOAD_S3_BUCKET=not-defined DR_UPLOAD_S3_ROLE=to-be-defined DR_LOCAL_S3_BUCKET=bucket DR_LOCAL_S3_PROFILE=minio DR_GUI_ENABLE=False DR_KINESIS_STREAM_NAME= DR_CAMERA_MAIN_ENABLE=True DR_CAMERA_SUB_ENABLE=False DR_CAMERA_KVS_ENABLE=True DR_SAGEMAKER_IMAGE=5.1.0-gpu DR_ROBOMAKER_IMAGE=5.1.0-cpu-avx2 DR_MINIO_IMAGE=latest DR_ANALYSIS_IMAGE=cpu DR_COACH_IMAGE=5.1.0 DR_WORKERS=1 DR_ROBOMAKER_MOUNT_LOGS=False DR_CLOUD_WATCH_ENABLE=False DR_DOCKER_STYLE=swarm DR_HOST_X=False DR_WEBVIEWER_PORT=8100

when i run "dr-start-training" sagemaker starts but i get stuck at "DoorMan: installing SIGINT, SIGTERM" and the training does not start. I tried restarting docker and my laptop. I'm using Arch Linux for this.

larsll commented 1 year ago

When this happens it is typically Robomaker that is not starting up. (Potentially due to wrong cofiguration.) Do dr-stop-training, dr-start-training and dr-logs-robomaker to have a better look at it.

ShabirK21 commented 1 year ago

09/07/2023 10:52:40 passing arg to libvncserver: -rfbport 09/07/2023 10:52:40 passing arg to libvncserver: 5900 09/07/2023 10:52:40 x11vnc version: 0.9.13 lastmod: 2011-08-10 pid: 62 09/07/2023 10:52:40 09/07/2023 10:52:40 wait_for_client: WAIT:0 09/07/2023 10:52:40 09/07/2023 10:52:40 initialize_screen: fb_depth/fb_bpp/fb_Bpl 24/32/2560 09/07/2023 10:52:40 09/07/2023 10:52:40 Listening for VNC connections on TCP port 5900 09/07/2023 10:52:40 Listening for VNC connections on TCP6 port 5900 09/07/2023 10:52:40 listen6: bind: Address already in use 09/07/2023 10:52:40 Not listening on IPv6 interface. 09/07/2023 10:52:40

The VNC desktop is: 06c24141a045:0 PORT=5900 JWM: warning: /etc/jwm/system.jwmrc[6]: invalid include: /etc/jwm/debian-menu IP: 10.0.1.9 172.19.0.6 10.0.0.6 (06c24141a045) 10:52:42 INFO:[DeepRacerNodeMonitor]: NodeMonitor started running 10:52:42 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:43 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:44 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:45 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:46 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:47 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:48 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:49 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead

larsll commented 10 months ago

Not able to decipher what is going wrong here. Arch Linux is not supported -- so could be the reason. Is suggest you reach out to the community http://join.deepracing.io/ for support in setting things up and debugging.