aws-deepracer-community / deepracer-core

A repository binding together everything needed for DeepRacer local.
259 stars 113 forks source link

Update robomaker is failing #27

Closed spoecker closed 5 years ago

spoecker commented 5 years ago

I updated both docker images today to be able to use the new world. Now I have the following problem: After 1 training period SageMaker is at: saved intermediate frozen graph: rl-deepracer-sagemaker/model/model_0.pb

Robomaker is stuck at: reward: 123456

for several minutes and then shows this error: Could not connect to the endpoint URL: "https://robomaker.us-east1.amazonaws.com/cancelSimulationJob"

With the old track everything was working

crr0004 commented 5 years ago

Looks like a regression of issue #26 caused by the recent update. I forgot to resync the python files in rl_coach/src/markov

Can you pull the most recent commit and try again?

spoecker commented 5 years ago

I tried, sadly same result. Screenshot 2019-07-09 at 16 31 03

crr0004 commented 5 years ago

Can you post the full log from the terminal on the bottom there? It will have the stack trace of where this error is coming from and will give me a good idea of what is causing the issue

spoecker commented 5 years ago

(sagemaker_venv) [CORP\spoecker@a-3962e11qoanik rl_coach]$ python rl_deepracer_coach_robomaker.py Looking for config file: /home/spoecker/.sagemaker/config.yaml Model checkpoints and other metadata will be stored at: s3://bucket/rl-deepracer-sagemaker Uploading to s3://bucket/rl-deepracer-sagemaker WARNING:sagemaker:Parameter image_name is specified, toolkit, toolkit_version, framework are going to be ignored when choosing the image. s3.ServiceResource() Using provided s3_client INFO:sagemaker:Creating training-job with name: rl-deepracer-sagemaker Starting training job Using /home/spoecker/Desktop/Deepracer/robo/container for container temp files Using /home/spoecker/Desktop/Deepracer/robo/container for container temp files Trying to launch image: crr0004/sagemaker-rl-tensorflow:console Creating tmprbhlxguc_algo-1-0bx09_1 ... done Attaching to tmprbhlxguc_algo-1-0bx09_1 algo-1-0bx09_1 | $1 is train algo-1-0bx09_1 | In train start.sh algo-1-0bx09_1 | Current host is "algo-1-0bx09" algo-1-0bx09_1 | Compiling changehostname.c algo-1-0bx09_1 | Done Compiling changehostname.c algo-1-0bx09_1 | 23:C 15 Jul 2019 03:06:47.323 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo algo-1-0bx09_1 | 23:C 15 Jul 2019 03:06:47.323 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=23, just started algo-1-0bx09_1 | 23:C 15 Jul 2019 03:06:47.323 # Configuration loaded algo-1-0bx09_1 | 23:M 15 Jul 2019 03:06:47.324 # You requested maxclients of 10000 requiring at least 10032 max file descriptors. algo-1-0bx09_1 | 23:M 15 Jul 2019 03:06:47.324 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted. algo-1-0bx09_1 | 23:M 15 Jul 2019 03:06:47.324 # Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'. algo-1-0bx091 | ._
algo-1-0bx091 | .-__ ''-._ algo-1-0bx09_1 | _.- .. ''-. Redis 5.0.5 (00000000/0) 64 bit algo-1-0bx091 | .-`` .-.\/ ., ''-.
algo-1-0bx09_1 | ( ' , .-|, ) Running in standalone mode algo-1-0bx091 | |`-.-...- ...-.`-._|' _.-'| Port: 6379 algo-1-0bx091 | | `-. ._ / _.-' | PID: 23 algo-1-0bx09_1 |-. `-. -./ _.-' _.-' algo-1-0bx09_1 | |-.`-. `-..-' .-'.-'|
algo-1-0bx091 | | `-.-._ _.-'_.-' | http://redis.io algo-1-0bx09_1 |-. `-.-.__.-'_.-' _.-' algo-1-0bx09_1 | |-.`-. -.__.-' _.-'_.-'| algo-1-0bx09_1 | |-.`-. .-'.-' |
algo-1-0bx091 | `-. -._-..-'.-' .-'
algo-1-0bx091 | `-. `-.
.-' _.-'
algo-1-0bx091 | `-. _.-'
algo-1-0bx09_1 | `-.__.-'
algo-1-0bx09_1 | algo-1-0bx09_1 | 23:M 15 Jul 2019 03:06:47.324 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. algo-1-0bx09_1 | 23:M 15 Jul 2019 03:06:47.324 # Server initialized algo-1-0bx09_1 | 23:M 15 Jul 2019 03:06:47.324 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. algo-1-0bx09_1 | 23:M 15 Jul 2019 03:06:47.324 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. algo-1-0bx09_1 | 23:M 15 Jul 2019 03:06:47.324 * Ready to accept connections algo-1-0bx09_1 | 15/07/2019 03:06:47 passing arg to libvncserver: -rfbport algo-1-0bx09_1 | 15/07/2019 03:06:47 passing arg to libvncserver: 5800 algo-1-0bx09_1 | 15/07/2019 03:06:47 x11vnc version: 0.9.13 lastmod: 2011-08-10 pid: 24 algo-1-0bx09_1 | 15/07/2019 03:06:47 algo-1-0bx09_1 | 15/07/2019 03:06:47 wait_for_client: WAIT:0 algo-1-0bx09_1 | 15/07/2019 03:06:47 algo-1-0bx09_1 | 15/07/2019 03:06:47 initialize_screen: fb_depth/fb_bpp/fb_Bpl 24/32/2560 algo-1-0bx09_1 | 15/07/2019 03:06:47 algo-1-0bx09_1 | 15/07/2019 03:06:47 Listening for VNC connections on TCP port 5800 algo-1-0bx09_1 | 15/07/2019 03:06:47 Listening for VNC connections on TCP6 port 5900 algo-1-0bx09_1 | 15/07/2019 03:06:47 Listening also on IPv6 port 5800 (socket 6) algo-1-0bx09_1 | 15/07/2019 03:06:47 algo-1-0bx09_1 | algo-1-0bx09_1 | The VNC desktop is: e196104c793c:5800 algo-1-0bx09_1 | 15/07/2019 03:06:47 possible alias: e196104c793c::5800 algo-1-0bx09_1 | PORT=5800 algo-1-0bx09_1 | 2019-07-15 03:06:48,984 sagemaker-containers INFO Imported framework sagemaker_tensorflow_container.training algo-1-0bx09_1 | 2019-07-15 03:06:48,990 sagemaker-containers INFO No GPUs detected (normal if no gpus installed) algo-1-0bx09_1 | 2019-07-15 03:06:49,047 sagemaker-containers INFO No GPUs detected (normal if no gpus installed) algo-1-0bx09_1 | 2019-07-15 03:06:49,065 sagemaker-containers INFO No GPUs detected (normal if no gpus installed) algo-1-0bx09_1 | 2019-07-15 03:06:49,079 sagemaker-containers INFO Invoking user script algo-1-0bx09_1 | algo-1-0bx09_1 | Training Env: algo-1-0bx09_1 | algo-1-0bx09_1 | { algo-1-0bx09_1 | "additional_framework_parameters": { algo-1-0bx09_1 | "sagemaker_estimator": "RLEstimator" algo-1-0bx09_1 | }, algo-1-0bx09_1 | "channel_input_dirs": {}, algo-1-0bx09_1 | "current_host": "algo-1-0bx09", algo-1-0bx09_1 | "framework_module": "sagemaker_tensorflow_container.training:main", algo-1-0bx09_1 | "hosts": [ algo-1-0bx09_1 | "algo-1-0bx09" algo-1-0bx09_1 | ], algo-1-0bx09_1 | "hyperparameters": { algo-1-0bx09_1 | "s3_bucket": "bucket", algo-1-0bx09_1 | "s3_prefix": "rl-deepracer-sagemaker", algo-1-0bx09_1 | "aws_region": "us-east-1", algo-1-0bx09_1 | "model_metadata_s3_key": "s3://bucket/custom_files/model_metadata.json", algo-1-0bx09_1 | "RLCOACH_PRESET": "deepracer", algo-1-0bx09_1 | "loss_type": "mean squared error" algo-1-0bx09_1 | }, algo-1-0bx09_1 | "input_config_dir": "/opt/ml/input/config", algo-1-0bx09_1 | "input_data_config": {}, algo-1-0bx09_1 | "input_dir": "/opt/ml/input", algo-1-0bx09_1 | "is_master": true, algo-1-0bx09_1 | "job_name": "rl-deepracer-sagemaker", algo-1-0bx09_1 | "log_level": 20, algo-1-0bx09_1 | "master_hostname": "algo-1-0bx09", algo-1-0bx09_1 | "model_dir": "/opt/ml/model", algo-1-0bx09_1 | "module_dir": "s3://bucket/rl-deepracer-sagemaker/source/sourcedir.tar.gz", algo-1-0bx09_1 | "module_name": "training_worker", algo-1-0bx09_1 | "network_interface_name": "eth0", algo-1-0bx09_1 | "num_cpus": 8, algo-1-0bx09_1 | "num_gpus": 0, algo-1-0bx09_1 | "output_data_dir": "/opt/ml/output/data", algo-1-0bx09_1 | "output_dir": "/opt/ml/output", algo-1-0bx09_1 | "output_intermediate_dir": "/opt/ml/output/intermediate", algo-1-0bx09_1 | "resource_config": { algo-1-0bx09_1 | "current_host": "algo-1-0bx09", algo-1-0bx09_1 | "hosts": [ algo-1-0bx09_1 | "algo-1-0bx09" algo-1-0bx09_1 | ] algo-1-0bx09_1 | }, algo-1-0bx09_1 | "user_entry_point": "training_worker.py" algo-1-0bx09_1 | } algo-1-0bx09_1 | algo-1-0bx09_1 | Environment variables: algo-1-0bx09_1 | algo-1-0bx09_1 | SM_HOSTS=["algo-1-0bx09"] algo-1-0bx09_1 | SM_NETWORK_INTERFACE_NAME=eth0 algo-1-0bx09_1 | SM_HPS={"RLCOACH_PRESET":"deepracer","aws_region":"us-east-1","loss_type":"mean squared error","model_metadata_s3_key":"s3://bucket/custom_files/model_metadata.json","s3_bucket":"bucket","s3_prefix":"rl-deepracer-sagemaker"} algo-1-0bx09_1 | SM_USER_ENTRY_POINT=training_worker.py algo-1-0bx09_1 | SM_FRAMEWORK_PARAMS={"sagemaker_estimator":"RLEstimator"} algo-1-0bx09_1 | SM_RESOURCE_CONFIG={"current_host":"algo-1-0bx09","hosts":["algo-1-0bx09"]} algo-1-0bx09_1 | SM_INPUT_DATA_CONFIG={} algo-1-0bx09_1 | SM_OUTPUT_DATA_DIR=/opt/ml/output/data algo-1-0bx09_1 | SM_CHANNELS=[] algo-1-0bx09_1 | SM_CURRENT_HOST=algo-1-0bx09 algo-1-0bx09_1 | SM_MODULE_NAME=training_worker algo-1-0bx09_1 | SM_LOG_LEVEL=20 algo-1-0bx09_1 | SM_FRAMEWORK_MODULE=sagemaker_tensorflow_container.training:main algo-1-0bx09_1 | SM_INPUT_DIR=/opt/ml/input algo-1-0bx09_1 | SM_INPUT_CONFIG_DIR=/opt/ml/input/config algo-1-0bx09_1 | SM_OUTPUT_DIR=/opt/ml/output algo-1-0bx09_1 | SM_NUM_CPUS=8 algo-1-0bx09_1 | SM_NUM_GPUS=0 algo-1-0bx09_1 | SM_MODEL_DIR=/opt/ml/model algo-1-0bx09_1 | SM_MODULE_DIR=s3://bucket/rl-deepracer-sagemaker/source/sourcedir.tar.gz algo-1-0bx09_1 | SM_TRAINING_ENV={"additional_framework_parameters":{"sagemaker_estimator":"RLEstimator"},"channel_input_dirs":{},"current_host":"algo-1-0bx09","framework_module":"sagemaker_tensorflow_container.training:main","hosts":["algo-1-0bx09"],"hyperparameters":{"RLCOACH_PRESET":"deepracer","aws_region":"us-east-1","loss_type":"mean squared error","model_metadata_s3_key":"s3://bucket/custom_files/model_metadata.json","s3_bucket":"bucket","s3_prefix":"rl-deepracer-sagemaker"},"input_config_dir":"/opt/ml/input/config","input_data_config":{},"input_dir":"/opt/ml/input","is_master":true,"job_name":"rl-deepracer-sagemaker","log_level":20,"master_hostname":"algo-1-0bx09","model_dir":"/opt/ml/model","module_dir":"s3://bucket/rl-deepracer-sagemaker/source/sourcedir.tar.gz","module_name":"training_worker","network_interface_name":"eth0","num_cpus":8,"num_gpus":0,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1-0bx09","hosts":["algo-1-0bx09"]},"user_entry_point":"training_worker.py"} algo-1-0bx09_1 | SM_USER_ARGS=["--RLCOACH_PRESET","deepracer","--aws_region","us-east-1","--loss_type","mean squared error","--model_metadata_s3_key","s3://bucket/custom_files/model_metadata.json","--s3_bucket","bucket","--s3_prefix","rl-deepracer-sagemaker"] algo-1-0bx09_1 | SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate algo-1-0bx09_1 | SM_HP_S3_BUCKET=bucket algo-1-0bx09_1 | SM_HP_S3_PREFIX=rl-deepracer-sagemaker algo-1-0bx09_1 | SM_HP_AWS_REGION=us-east-1 algo-1-0bx09_1 | SM_HP_MODEL_METADATA_S3_KEY=s3://bucket/custom_files/model_metadata.json algo-1-0bx09_1 | SM_HP_RLCOACH_PRESET=deepracer algo-1-0bx09_1 | SM_HP_LOSS_TYPE=mean squared error algo-1-0bx09_1 | algo-1-0bx09_1 | Invoking script with the following command: algo-1-0bx09_1 | algo-1-0bx09_1 | /usr/bin/python3.6 training_worker.py --RLCOACH_PRESET deepracer --aws_region us-east-1 --loss_type mean squared error --model_metadata_s3_key s3://bucket/custom_files/model_metadata.json --s3_bucket bucket --s3_prefix rl-deepracer-sagemaker algo-1-0bx09_1 | algo-1-0bx09_1 | algo-1-0bx09_1 | Initializing SageS3Client... algo-1-0bx09_1 | Successfully downloaded model metadata from custom_files/model_metadata.json. algo-1-0bx09_1 | Using the following hyper-parameters algo-1-0bx09_1 | { algo-1-0bx09_1 | "batch_size": 64, algo-1-0bx09_1 | "beta_entropy": 0.01, algo-1-0bx09_1 | "discount_factor": 0.999, algo-1-0bx09_1 | "e_greedy_value": 0.05, algo-1-0bx09_1 | "epsilon_steps": 10000, algo-1-0bx09_1 | "exploration_type": "categorical", algo-1-0bx09_1 | "loss_type": "mean squared error", algo-1-0bx09_1 | "lr": 0.0003, algo-1-0bx09_1 | "num_episodes_between_training": 20, algo-1-0bx09_1 | "num_epochs": 10, algo-1-0bx09_1 | "stack_size": 1, algo-1-0bx09_1 | "term_cond_avg_score": 100000.0, algo-1-0bx09_1 | "term_cond_max_episodes": 100000 algo-1-0bx09_1 | } algo-1-0bx09_1 | Uploaded hyperparameters.json to S3 algo-1-0bx09_1 | Uploaded IP address information to S3: 172.18.0.3 algo-1-0bx09_1 | ## Creating graph - name: BasicRLGraphManager algo-1-0bx09_1 | Loaded action space from file: [{'steering_angle': -25, 'speed': 3.0, 'index': 0}, {'steering_angle': -25, 'speed': 6, 'index': 1}, {'steering_angle': -12.5, 'speed': 3, 'index': 2}, {'steering_angle': -12.5, 'speed': 6, 'index': 3}, {'steering_angle': 0, 'speed': 3, 'index': 4}, {'steering_angle': 0, 'speed': 6, 'index': 5}, {'steering_angle': 12.5, 'speed': 3, 'index': 6}, {'steering_angle': 12.5, 'speed': 6, 'index': 7}, {'steering_angle': 25, 'speed': 3, 'index': 8}, {'steering_angle': 25, 'speed': 6, 'index': 9}] algo-1-0bx09_1 | ## Creating agent - name: agent algo-1-0bx09_1 | Checkpoint> Saving in path=['./checkpoint/0_Step-0.ckpt'] algo-1-0bx09_1 | Uploaded 3 files for checkpoint 0 algo-1-0bx09_1 | INFO:tensorflow:Froze 11 variables. algo-1-0bx09_1 | INFO:tensorflow:Converted 11 variables to const ops. algo-1-0bx09_1 | saved intermediate frozen graph: rl-deepracer-sagemaker/model/model_0.pb

spoecker commented 5 years ago

[CORP\spoecker@a-3962e11qoanik deepracer]$ docker run --rm --name dr --env-file ./robomaker.env --network sagemaker-local -p 8080:5900 -it crr0004/deepracer_robomaker:console rm: cannot remove 'build': No such file or directory rm: cannot remove 'install': No such file or directory Starting >>> deepracer_simulation [0.293s] WARNING:colcon.colcon_ros.prefix_path.catkin:The path '/opt/ros/kinetic' in the environment variable CMAKE_PREFIX_PATH seems to be a catkin workspace but it doesn't contain any 'local_setup.*' files. Maybe the catkin version is not up-to-date? Starting >>> sagemaker_rl_agent Finished <<< sagemaker_rl_agent [0.79s]
Finished <<< deepracer_simulation [4.40s]

Summary: 2 packages finished [4.55s] 15/07/2019 03:06:51 passing arg to libvncserver: -rfbport 15/07/2019 03:06:51 passing arg to libvncserver: 5900 15/07/2019 03:06:51 x11vnc version: 0.9.13 lastmod: 2011-08-10 pid: 793 15/07/2019 03:06:51 15/07/2019 03:06:51 wait_for_client: WAIT:0 15/07/2019 03:06:51 15/07/2019 03:06:51 initialize_screen: fb_depth/fb_bpp/fb_Bpl 24/32/2560 15/07/2019 03:06:51 15/07/2019 03:06:51 Listening for VNC connections on TCP port 5900 15/07/2019 03:06:51 Listening for VNC connections on TCP6 port 5900 15/07/2019 03:06:51 listen6: bind: Address already in use 15/07/2019 03:06:51 Not listening on IPv6 interface. 15/07/2019 03:06:51

The VNC desktop is: 58d5a9fd7994:0 PORT=5900 ... logging to /root/.ros/log/97b300e8-a6ad-11e9-a42a-0242ac120002/roslaunch-58d5a9fd7994-794.log Checking log directory for disk usage. This may take awhile. Press Ctrl-C to interrupt Done checking log file disk usage. Usage is <1GB.

[ INFO] [1563160011.773430852]: rviz version 1.12.17 [ INFO] [1563160011.773487118]: compiled against Qt version 5.5.1 [ INFO] [1563160011.773507267]: compiled against OGRE version 1.9.0 (Ghadamon) started roslaunch server http://58d5a9fd7994:40725/

SUMMARY

PARAMETERS

NODES /racecar/ controller_manager (controller_manager/spawner) / agent (deepracer_simulation/run_rollout_rl_agent.sh) better_odom (topic_tools/relay) car_reset_node (deepracer_simulation/car_node.py) gazebo (gazebo_ros/gzserver) gazebo_gui (gazebo_ros/gzclient) racecar_spawn (gazebo_ros/spawn_model) robot_state_publisher (robot_state_publisher/robot_state_publisher)

auto-starting new master process[master]: started with pid [843] ROS_MASTER_URI=http://localhost:11311

setting /run_id to 97b300e8-a6ad-11e9-a42a-0242ac120002 process[rosout-1]: started with pid [856] started core service [/rosout] IP: 172.18.0.2 (58d5a9fd7994) process[gazebo-2]: started with pid [872] process[gazebo_gui-3]: started with pid [879] process[racecar_spawn-4]: started with pid [884] process[racecar/controller_manager-5]: started with pid [892] process[robot_state_publisher-6]: started with pid [894] process[car_reset_node-7]: started with pid [895] process[better_odom-8]: started with pid [897] process[agent-9]: started with pid [905]

SIM_TRACE_LOG:0,1,4.4375,0.5318,-0.0566,-0.44,3.00,0,86887.7977,False,True,0.6404,1,21.88,1563160016.5610707

Creating agent - name: agent

2019-07-15 03:07:04.701005: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Loading checkpoint: ./checkpoint/0_Step-0.ckpt

SIM_TRACE_LOG:0,0,4.4374,0.5318,-0.0566,0.00,0.00,0,0.0100,False,True,0.6400,1,21.88,1563160028.8222458

SIM_TRACE_LOG:0,1,4.4376,0.5318,-0.0563,-0.44,6.00,1,93501.1930,False,True,0.6408,1,21.88,1563160028.9398441

SIM_TRACE_LOG:0,2,4.4470,0.5307,-0.0607,0.00,3.00,4,89572.6359,False,True,0.6841,1,21.88,1563160028.9919877

SIM_TRACE_LOG:0,3,4.4645,0.5286,-0.0676,-0.44,6.00,1,99915.4623,False,True,0.7646,1,21.88,1563160029.0647988

SIM_TRACE_LOG:0,4,4.4996,0.5234,-0.0854,0.00,3.00,4,103377.3954,False,True,0.9266,2,21.88,1563160029.1309657

SIM_TRACE_LOG:0,5,4.5545,0.5148,-0.1080,-0.22,6.00,3,103597.1877,False,True,1.1803,2,21.88,1563160029.2354603

SIM_TRACE_LOG:0,6,4.6381,0.4991,-0.1433,-0.44,3.00,0,75404.3269,False,True,1.5672,3,21.88,1563160029.322305

SIM_TRACE_LOG:0,7,4.6935,0.4845,-0.1809,0.00,3.00,4,52164.1297,False,True,1.8253,3,21.88,1563160029.3778887

SIM_TRACE_LOG:0,8,4.7517,0.4684,-0.2118,-0.22,6.00,3,41922.7395,False,True,2.0969,4,21.88,1563160029.4541104

SIM_TRACE_LOG:0,9,4.8074,0.4504,-0.2464,0.44,6.00,9,18319.2985,False,True,2.3573,4,21.88,1563160029.502825

SIM_TRACE_LOG:0,10,4.8889,0.4215,-0.2913,0.22,3.00,6,2306.2885,False,True,2.7398,5,21.88,1563160029.5639482

SIM_TRACE_LOG:0,11,4.9708,0.3891,-0.3330,-0.22,3.00,2,2104.0866,False,True,3.1235,5,21.88,1563160029.6450453

SIM_TRACE_LOG:0,12,5.0249,0.3644,-0.3697,0.22,3.00,6,1943.0658,False,True,3.3791,6,21.88,1563160029.7375066

SIM_TRACE_LOG:0,13,5.0864,0.3358,-0.3956,0.00,3.00,4,1753.6179,False,True,3.6667,6,21.88,1563160029.807551

SIM_TRACE_LOG:0,14,5.1514,0.3027,-0.4246,-0.22,3.00,2,1531.0547,False,True,3.9747,7,21.88,1563160029.8644927

SIM_TRACE_LOG:0,15,5.1915,0.2809,-0.4451,0.00,3.00,4,1380.1086,False,True,4.1595,7,21.88,1563160029.9266427

SIM_TRACE_LOG:0,16,5.2388,0.2551,-0.4627,0.00,6.00,5,7226.9994,False,True,4.3838,7,21.88,1563160029.998425

SIM_TRACE_LOG:0,17,5.3053,0.2188,-0.4827,0.22,3.00,6,948.5265,False,True,4.6988,8,21.88,1563160030.076893

SIM_TRACE_LOG:0,18,5.3876,0.1742,-0.4898,-0.22,6.00,3,0.0100,False,False,5.0793,9,21.88,1563160030.1630979

SIM_TRACE_LOG:0,19,5.4402,0.1430,-0.5067,-0.22,6.00,3,0.0100,False,False,5.3205,9,21.88,1563160030.21294

SIM_TRACE_LOG:0,20,5.5262,0.0874,-0.5485,0.00,3.00,4,0.0100,False,False,5.7275,10,21.88,1563160030.2679574

SIM_TRACE_LOG:0,21,5.5571,0.0883,-0.7908,0.00,3.00,4,0.0100,False,False,5.8681,10,21.88,1563160030.3390045

SIM_TRACE_LOG:0,22,5.5968,0.0797,-1.1486,0.00,3.00,4,0.0100,False,False,6.0411,10,21.88,1563160030.408615

SIM_TRACE_LOG:0,23,5.6209,0.0635,-1.4066,-0.22,6.00,3,0.0100,False,False,6.1550,10,21.88,1563160030.4816608

SIM_TRACE_LOG:0,24,5.6379,0.0475,-1.6411,-0.22,3.00,2,0.0100,False,False,6.2361,11,21.88,1563160030.5628703

SIM_TRACE_LOG:0,25,5.6392,0.0490,-1.6598,0.44,3.00,8,0.0100,False,False,6.2420,11,21.88,1563160030.64669

SIM_TRACE_LOG:0,26,5.6391,0.0488,-1.6572,-0.22,3.00,2,0.0100,False,False,6.2420,11,21.88,1563160030.6922758

SIM_TRACE_LOG:0,27,5.6387,0.0482,-1.6492,0.44,6.00,9,0.0100,False,False,6.2420,11,21.88,1563160030.7776704

SIM_TRACE_LOG:0,28,5.6386,0.0479,-1.6456,0.44,6.00,9,0.0100,False,False,6.2420,11,21.88,1563160030.8289824

SIM_TRACE_LOG:0,29,5.6382,0.0471,-1.6369,0.22,3.00,6,0.0100,False,False,6.2420,11,21.88,1563160030.9031272

SIM_TRACE_LOG:0,30,5.6382,0.0468,-1.6330,0.22,6.00,7,0.0100,False,False,6.2420,11,21.88,1563160030.9813893

SIM_TRACE_LOG:0,31,5.6379,0.0464,-1.6280,0.22,6.00,7,0.0100,False,False,6.2420,11,21.88,1563160031.0461774

SIM_TRACE_LOG:0,32,5.6376,0.0460,-1.6230,-0.22,6.00,3,0.0100,False,False,6.2420,11,21.88,1563160031.1192632

SIM_TRACE_LOG:0,33,5.6373,0.0459,-1.6195,-0.44,3.00,0,0.0100,False,False,6.2420,11,21.88,1563160031.1935537

SIM_TRACE_LOG:0,34,5.6374,0.0458,-1.6205,0.44,3.00,8,0.0100,False,False,6.2420,11,21.88,1563160031.254932

SIM_TRACE_LOG:0,35,5.6374,0.0458,-1.6207,-0.22,3.00,2,0.0100,False,False,6.2420,11,21.88,1563160031.3329365

SIM_TRACE_LOG:0,36,5.6374,0.0458,-1.6206,0.22,6.00,7,0.0100,False,False,6.2420,11,21.88,1563160031.4067478

SIM_TRACE_LOG:0,37,5.6374,0.0457,-1.6205,-0.44,3.00,0,0.0100,False,False,6.2420,11,21.88,1563160031.4996529

SIM_TRACE_LOG:0,38,5.6373,0.0458,-1.6209,0.00,6.00,5,0.0100,False,False,6.2420,11,21.88,1563160031.544607

SIM_TRACE_LOG:0,39,5.6373,0.0458,-1.6206,0.44,6.00,9,0.0100,False,False,6.2420,11,21.88,1563160031.6019826

SIM_TRACE_LOG:0,40,5.6373,0.0458,-1.6206,-0.22,6.00,3,0.0100,False,False,6.2420,11,21.88,1563160031.6752388

SIM_TRACE_LOG:0,41,5.6373,0.0458,-1.6203,0.44,6.00,9,0.0100,False,False,6.2420,11,21.88,1563160031.75576

SIM_TRACE_LOG:0,42,5.6374,0.0458,-1.6206,0.44,6.00,9,0.0100,False,False,6.2420,11,21.88,1563160031.835736

SIM_TRACE_LOG:0,43,5.6374,0.0458,-1.6206,-0.22,3.00,2,0.0100,False,False,6.2420,11,21.88,1563160031.903566

SIM_TRACE_LOG:0,44,5.6374,0.0457,-1.6206,-0.22,3.00,2,0.0100,False,False,6.2420,11,21.88,1563160032.0048857

SIM_TRACE_LOG:0,45,5.6375,0.0459,-1.6194,0.00,3.00,4,0.0100,False,False,6.2420,11,21.88,1563160032.0623138

SIM_TRACE_LOG:0,46,5.6375,0.0458,-1.6208,0.00,6.00,5,0.0100,False,False,6.2420,11,21.88,1563160032.1338956

SIM_TRACE_LOG:0,47,5.6375,0.0457,-1.6206,0.00,6.00,5,0.0100,False,False,6.2420,11,21.88,1563160032.186223

SIM_TRACE_LOG:0,48,5.6374,0.0457,-1.6206,0.00,3.00,4,0.0100,False,False,6.2420,11,21.88,1563160032.2735958

SIM_TRACE_LOG:0,49,5.6374,0.0458,-1.6205,-0.44,3.00,0,0.0100,False,False,6.2420,11,21.88,1563160032.4049213

SIM_TRACE_LOG:0,50,5.6373,0.0457,-1.6209,0.22,6.00,7,0.0100,False,False,6.2420,11,21.88,1563160032.4572916

SIM_TRACE_LOG:0,51,5.6374,0.0459,-1.6209,0.44,3.00,8,0.0100,False,False,6.2420,11,21.88,1563160032.5045547

SIM_TRACE_LOG:0,52,5.6374,0.0458,-1.6204,0.00,3.00,4,0.0100,False,False,6.2420,11,21.88,1563160032.5468073

SIM_TRACE_LOG:0,53,5.6373,0.0458,-1.6207,-0.22,3.00,2,0.0100,False,False,6.2420,11,21.88,1563160032.6250565

SIM_TRACE_LOG:0,54,5.6373,0.0457,-1.6205,0.22,6.00,7,0.0100,False,False,6.2420,11,21.88,1563160032.7050037

SIM_TRACE_LOG:0,55,5.6373,0.0458,-1.6208,-0.22,6.00,3,0.0100,False,False,6.2420,11,21.88,1563160032.758985

SIM_TRACE_LOG:0,56,5.6372,0.0458,-1.6206,-0.44,6.00,1,0.0100,False,False,6.2420,11,21.88,1563160032.8285449

SIM_TRACE_LOG:0,57,5.6373,0.0458,-1.6203,0.22,3.00,6,0.0100,False,False,6.2420,11,21.88,1563160032.889745

SIM_TRACE_LOG:0,58,5.6373,0.0458,-1.6207,-0.22,6.00,3,0.0100,False,False,6.2420,11,21.88,1563160032.972706

SIM_TRACE_LOG:0,59,5.6373,0.0457,-1.6205,0.22,6.00,7,0.0100,False,False,6.2420,11,21.88,1563160033.095248

SIM_TRACE_LOG:0,60,5.6373,0.0458,-1.6206,-0.44,3.00,0,0.0000,True,False,6.2420,11,21.88,1563160033.151458

reward: 696968.5468506046 Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/urllib3/connection.py", line 160, in _new_conn (self._dns_host, self.port), self.timeout, **extra_kw) File "/usr/local/lib/python3.5/dist-packages/urllib3/util/connection.py", line 57, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/botocore/httpsession.py", line 262, in send chunked=self._chunked(request.headers), File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 641, in urlopen _stacktrace=sys.exc_info()[2]) File "/usr/local/lib/python3.5/dist-packages/urllib3/util/retry.py", line 344, in increment raise six.reraise(type(error), error, _stacktrace) File "/usr/local/lib/python3.5/dist-packages/urllib3/packages/six.py", line 686, in reraise raise value File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 603, in urlopen chunked=chunked) File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 344, in _make_request self._validate_conn(conn) File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 843, in _validate_conn conn.connect() File "/usr/local/lib/python3.5/dist-packages/urllib3/connection.py", line 316, in connect conn = self._new_conn() File "/usr/local/lib/python3.5/dist-packages/urllib3/connection.py", line 169, in _new_conn self, "Failed to establish a new connection: %s" % e) urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7f54626aff60>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 303, in main() File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 298, in main memory_backend_params = memory_backend_params File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 169, in rollout_worker graph_manager.act(EnvironmentEpisodes(num_steps=act_steps)) File "/usr/local/lib/python3.5/dist-packages/rl_coach/graph_managers/graph_manager.py", line 443, in act result = self.top_level_manager.step(None) File "/usr/local/lib/python3.5/dist-packages/rl_coach/level_manager.py", line 230, in step env_response = self.environment.step(action_info.action) File "/usr/local/lib/python3.5/dist-packages/rl_coach/environments/environment.py", line 299, in step self._take_action(action) File "/usr/local/lib/python3.5/dist-packages/rl_coach/environments/gym_environment.py", line 448, in _take_action self.state, self.reward, self.done, self.info = self.env.step(action) File "/usr/local/lib/python3.5/dist-packages/gym/wrappers/time_limit.py", line 31, in step observation, reward, done, info = self.env.step(action) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/environments/deepracer_racetrack_env.py", line 566, in step return super().step([self.steering_angle, self.speed]) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/environments/deepracer_racetrack_env.py", line 271, in step self.infer_reward_state(self.steering_angle, self.speed) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/environments/deepracer_racetrack_env.py", line 437, in infer_reward_state self.finish_episode(current_progress) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/environments/deepracer_racetrack_env.py", line 473, in finish_episode self.cancel_simulation_job() File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/environments/deepracer_racetrack_env.py", line 518, in cancel_simulation_job job=self.simulation_job_arn File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 357, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 648, in _make_api_call operation_model, request_dict, request_context) File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 667, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/usr/local/lib/python3.5/dist-packages/botocore/endpoint.py", line 102, in make_request return self._send_request(request_dict, operation_model) File "/usr/local/lib/python3.5/dist-packages/botocore/endpoint.py", line 137, in _send_request success_response, exception): File "/usr/local/lib/python3.5/dist-packages/botocore/endpoint.py", line 231, in _needs_retry caught_exception=caught_exception, request_dict=request_dict) File "/usr/local/lib/python3.5/dist-packages/botocore/hooks.py", line 356, in emit return self._emitter.emit(aliased_event_name, kwargs) File "/usr/local/lib/python3.5/dist-packages/botocore/hooks.py", line 228, in emit return self._emit(event_name, kwargs) File "/usr/local/lib/python3.5/dist-packages/botocore/hooks.py", line 211, in _emit response = handler(kwargs) File "/usr/local/lib/python3.5/dist-packages/botocore/retryhandler.py", line 183, in call if self._checker(attempts, response, caught_exception): File "/usr/local/lib/python3.5/dist-packages/botocore/retryhandler.py", line 251, in call caught_exception) File "/usr/local/lib/python3.5/dist-packages/botocore/retryhandler.py", line 277, in _should_retry return self._checker(attempt_number, response, caught_exception) File "/usr/local/lib/python3.5/dist-packages/botocore/retryhandler.py", line 317, in call caught_exception) File "/usr/local/lib/python3.5/dist-packages/botocore/retryhandler.py", line 223, in call attempt_number, caught_exception) File "/usr/local/lib/python3.5/dist-packages/botocore/retryhandler.py", line 359, in _check_caught_exception raise caught_exception File "/usr/local/lib/python3.5/dist-packages/botocore/endpoint.py", line 200, in _do_get_response http_response = self._send(request) File "/usr/local/lib/python3.5/dist-packages/botocore/endpoint.py", line 244, in _send return self.http_session.send(request) File "/usr/local/lib/python3.5/dist-packages/botocore/httpsession.py", line 282, in send raise EndpointConnectionError(endpoint_url=request.url, error=e) botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://robomaker.us-east-1.amazonaws.com/cancelSimulationJob"

crr0004 commented 5 years ago

So it looks like it is still trying to call AWS to cancel the sim job, which results in a http error.

Can you run

docker run --rm --name dr --env-file ./robomaker.env --network sagemaker-local -p 8080:5900 -it crr0004/deepracer_robomaker:console cat /app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py

And post the results here?

spoecker commented 5 years ago

I run docker run --rm --name dr --env-file ./robomaker.env --network sagemaker-local -p 8080:5900 -it crr0004/deepracer_robomaker:console cat /app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py for 24h now. Nothing happening, no console output

crr0004 commented 5 years ago

Hmm. Odd. I was just trying to see if there was anything in that file that is off. I will double check the command

crr0004 commented 5 years ago

You can try docker run --rm --name dr --env-file ./robomaker.env --network sagemaker-local -p 8080:5900 -it crr0004/deepracer_robomaker:console "colcon build; cat /app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py" instead? It's meant to immediately return something. It if it hangs, something is wrong.

crr0004 commented 5 years ago

Can you also run docker run --rm --name dr --env-file ./robomaker.env --network sagemaker-local -p 8080:5900 -it crr0004/deepracer_robomaker:console "colcon build; cat /app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/environments/deepracer_racetrack_env.py"?

crr0004 commented 5 years ago

Okay it looks like it was a multiple regression from the file syncing. The image was missing the code and the code was also wrong. Can you try pull the image, the repo and running again?

spoecker commented 5 years ago

I pulled the images and the repo and run it again ... algo-1-i2cxi_1 | Uploaded hyperparameters.json to S3 algo-1-i2cxi_1 | Uploaded IP address information to S3: 172.18.0.4 algo-1-i2cxi_1 | ## Creating graph - name: BasicRLGraphManager algo-1-i2cxi_1 | Loaded action space from file: [{'steering_angle': -25, 'speed': 3.0, 'index': 0}, {'steering_angle': -25, 'speed': 6, 'index': 1}, {'steering_angle': -12.5, 'speed': 3, 'index': 2}, {'steering_angle': -12.5, 'speed': 6, 'index': 3}, {'steering_angle': 0, 'speed': 3, 'index': 4}, {'steering_angle': 0, 'speed': 6, 'index': 5}, {'steering_angle': 12.5, 'speed': 3, 'index': 6}, {'steering_angle': 12.5, 'speed': 6, 'index': 7}, {'steering_angle': 25, 'speed': 3, 'index': 8}, {'steering_angle': 25, 'speed': 6, 'index': 9}] algo-1-i2cxi_1 | ## Creating agent - name: agent algo-1-i2cxi_1 | Checkpoint> Saving in path=['./checkpoint/0_Step-0.ckpt'] algo-1-i2cxi_1 | Uploaded 3 files for checkpoint 0 algo-1-i2cxi_1 | INFO:tensorflow:Froze 11 variables. algo-1-i2cxi_1 | INFO:tensorflow:Converted 11 variables to const ops. algo-1-i2cxi_1 | saved intermediate frozen graph: rl-deepracer-sagemaker/model/model_0.pb algo-1-i2cxi_1 | Training> Name=main_level/agent, Worker=0, Episode=1, Total reward=2079263.68, Steps=42, Training iteration=0

... [ INFO] [1563508383.752479434, 0.380000000]: Physics dynamic reconfigure ready. [ INFO] [1563508383.801276497, 0.422000000]: Physics dynamic reconfigure ready. [INFO] [1563508383.934626, 0.552000]: Loading controller: right_rear_wheel_velocity_controller

Creating graph - name: BasicRLGraphManager

/usr/local/lib/python3.5/dist-packages/gym/envs/registration.py:14: PkgResourcesDeprecationWarning: Parameters to load are deprecated. Call .resolve and .require separately. result = entry_point.load(False) [INFO] [1563508384.388128, 1.003000]: Loading controller: left_front_wheel_velocity_controller [INFO] [1563508384.709887, 1.323000]: Loading controller: right_front_wheel_velocity_controller [INFO] [1563508384.810976, 1.409000]: Loading controller: left_steering_hinge_position_controller Loaded action space from file: [{'speed': 3.0, 'index': 0, 'steering_angle': -25}, {'speed': 6, 'index': 1, 'steering_angle': -25}, {'speed': 3, 'index': 2, 'steering_angle': -12.5}, {'speed': 6, 'index': 3, 'steering_angle': -12.5}, {'speed': 3, 'index': 4, 'steering_angle': 0}, {'speed': 6, 'index': 5, 'steering_angle': 0}, {'speed': 3, 'index': 6, 'steering_angle': 12.5}, {'speed': 6, 'index': 7, 'steering_angle': 12.5}, {'speed': 3, 'index': 8, 'steering_angle': 25}, {'speed': 6, 'index': 9, 'steering_angle': 25}] [INFO] [1563508385.060141, 1.637000]: Loading controller: right_steering_hinge_position_controller [INFO] [1563508385.232757, 1.811000]: Loading controller: joint_state_controller [INFO] [1563508385.271250, 1.844000]: Controller Spawner: Loaded controllers: left_rear_wheel_velocity_controller, right_rear_wheel_velocity_controller, left_front_wheel_velocity_controller, right_front_wheel_velocity_controller, left_steering_hinge_position_controller, right_steering_hinge_position_controller, joint_state_controller [INFO] [1563508385.280768, 1.856000]: Started controllers: left_rear_wheel_velocity_controller, right_rear_wheel_velocity_controller, left_front_wheel_velocity_controller, right_front_wheel_velocity_controller, left_steering_hinge_position_controller, right_steering_hinge_position_controller, joint_state_controller SIM_TRACE_LOG:0,0,4.4374,0.5318,-0.0564,0.00,0.00,0,0.0100,False,True,0.6399,1,21.88,1563508385.4749124

SIM_TRACE_LOG:0,1,4.4374,0.5318,-0.0564,-0.44,3.00,0,86799.3431,False,True,0.6400,1,21.88,1563508385.5607224

Creating agent - name: agent

2019-07-19 03:53:14.005197: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Loading checkpoint: ./checkpoint/0_Step-0.ckpt

SIM_TRACE_LOG:0,0,4.4374,0.5318,-0.0565,0.00,0.00,0,0.0100,False,True,0.6401,1,21.88,1563508398.1386523

SIM_TRACE_LOG:0,1,4.4382,0.5318,-0.0568,0.44,6.00,9,93798.2608,False,True,0.6437,1,21.88,1563508398.2581823

SIM_TRACE_LOG:0,2,4.4464,0.5314,-0.0559,-0.22,3.00,2,86822.5664,False,True,0.6815,1,21.88,1563508398.2943811

SIM_TRACE_LOG:0,3,4.4588,0.5310,-0.0536,-0.22,6.00,3,92995.6198,False,True,0.7377,1,21.88,1563508398.3413289

SIM_TRACE_LOG:0,4,4.4887,0.5297,-0.0529,0.00,3.00,4,85067.8961,False,True,0.8747,1,21.88,1563508398.4227524

SIM_TRACE_LOG:0,5,4.5367,0.5277,-0.0491,-0.44,3.00,0,82521.0044,False,True,1.0938,2,21.88,1563508398.5069866

SIM_TRACE_LOG:0,6,4.6109,0.5247,-0.0456,-0.44,3.00,0,74820.1111,False,True,1.4327,2,21.88,1563508398.5985994

SIM_TRACE_LOG:0,7,4.6710,0.5199,-0.0564,0.22,3.00,6,81336.4011,False,True,1.7079,3,21.88,1563508398.6519997

SIM_TRACE_LOG:0,8,4.7155,0.5159,-0.0654,0.22,3.00,6,88473.2405,False,True,1.9124,3,21.88,1563508398.6983905

SIM_TRACE_LOG:0,9,4.7783,0.5102,-0.0743,0.44,6.00,9,99996.1063,False,True,2.2007,4,21.88,1563508398.7886

SIM_TRACE_LOG:0,10,4.8622,0.5047,-0.0694,0.44,6.00,9,99680.9729,False,True,2.5852,4,21.88,1563508398.871944

SIM_TRACE_LOG:0,11,4.9583,0.5024,-0.0456,-0.22,6.00,3,89752.4556,False,True,3.0240,5,21.88,1563508398.9617953

SIM_TRACE_LOG:0,12,5.0775,0.5016,-0.0230,0.44,6.00,9,78668.7662,False,True,3.5676,6,21.88,1563508399.0140648

SIM_TRACE_LOG:0,13,5.1657,0.5016,-0.0116,0.00,6.00,5,73132.6373,False,True,3.9698,7,21.88,1563508399.0872471

SIM_TRACE_LOG:0,14,5.3063,0.5013,-0.0043,-0.44,3.00,0,64695.0845,False,True,4.6116,8,21.88,1563508399.169118

SIM_TRACE_LOG:0,15,5.4117,0.5007,-0.0065,0.22,6.00,7,77793.7254,False,True,5.0942,9,21.88,1563508399.2376678

SIM_TRACE_LOG:0,16,5.5115,0.4994,-0.0104,0.00,3.00,4,75708.7385,False,True,5.5510,9,21.88,1563508399.2830899

SIM_TRACE_LOG:0,17,5.6022,0.4977,-0.0164,0.44,6.00,9,89309.6306,False,True,5.9671,10,21.88,1563508399.361125

SIM_TRACE_LOG:0,18,5.7174,0.5006,0.0132,0.00,3.00,4,71259.8754,False,True,6.4950,11,21.88,1563508399.425759

SIM_TRACE_LOG:0,19,5.8013,0.5045,0.0295,0.44,3.00,8,61567.6215,False,True,6.8776,12,21.88,1563508399.4955459

SIM_TRACE_LOG:0,20,5.8789,0.5113,0.0551,0.22,6.00,7,62466.2224,False,True,7.2359,12,21.88,1563508399.5506108

SIM_TRACE_LOG:0,21,5.9680,0.5227,0.0922,-0.44,6.00,1,40004.0617,False,True,7.6412,13,21.88,1563508399.6270547

SIM_TRACE_LOG:0,22,6.0935,0.5352,0.0910,0.44,6.00,9,51297.9247,False,True,8.2225,14,21.88,1563508399.689689

SIM_TRACE_LOG:0,23,6.1945,0.5468,0.1005,0.44,3.00,8,53794.9543,False,True,8.6983,15,21.88,1563508399.7525225

SIM_TRACE_LOG:0,24,6.3125,0.5668,0.1414,0.22,6.00,7,59909.9705,False,True,9.2631,16,21.88,1563508399.821851

SIM_TRACE_LOG:0,25,6.3959,0.5828,0.1686,-0.22,3.00,2,69620.6658,False,True,9.6856,16,21.88,1563508399.876625

SIM_TRACE_LOG:0,26,6.4934,0.6021,0.1831,0.22,6.00,7,104111.5641,False,True,10.1943,17,21.88,1563508399.9597166

SIM_TRACE_LOG:0,27,6.5916,0.6266,0.2158,0.44,3.00,8,54892.8298,False,True,10.7300,18,21.88,1563508400.0426738

SIM_TRACE_LOG:0,28,6.7149,0.6689,0.2911,-0.44,3.00,0,28063.5181,False,True,11.4113,19,21.88,1563508400.1164913

SIM_TRACE_LOG:0,29,6.7732,0.6895,0.3093,0.22,6.00,7,46293.1490,False,True,11.6931,20,21.88,1563508400.17127

SIM_TRACE_LOG:0,30,6.8523,0.7209,0.3424,0.22,3.00,6,1794.7149,False,True,12.1648,21,21.88,1563508400.2437928

SIM_TRACE_LOG:0,31,6.9688,0.7729,0.3923,-0.44,3.00,0,1819.3461,False,True,12.8089,22,21.88,1563508400.3630335

SIM_TRACE_LOG:0,32,7.0275,0.7973,0.3911,0.00,3.00,4,2218.6975,False,True,13.1382,22,21.88,1563508400.4160743

SIM_TRACE_LOG:0,33,7.0724,0.8145,0.3821,-0.22,3.00,2,2383.0646,False,True,13.3361,23,21.88,1563508400.4683928

SIM_TRACE_LOG:0,34,7.1364,0.8379,0.3697,-0.44,6.00,1,7797.1512,False,True,13.6318,23,21.88,1563508400.5494704

SIM_TRACE_LOG:0,35,7.2154,0.8621,0.3332,0.00,3.00,4,2665.7023,False,True,13.9307,24,21.88,1563508400.6037889

SIM_TRACE_LOG:0,36,7.2824,0.8802,0.3048,0.00,6.00,5,8322.9197,False,True,14.1280,24,21.88,1563508400.6736581

SIM_TRACE_LOG:0,37,7.3563,0.8989,0.2808,0.44,6.00,9,7067.6916,False,True,14.3563,24,21.88,1563508400.7448854

SIM_TRACE_LOG:0,38,7.4474,0.9234,0.2779,-0.44,3.00,0,1137.0448,False,True,14.6435,25,21.88,1563508400.8020718

SIM_TRACE_LOG:0,39,7.5363,0.9471,0.2696,0.00,6.00,5,0.0100,False,False,14.7276,25,21.88,1563508400.8817086

SIM_TRACE_LOG:0,40,7.6273,0.9727,0.2718,0.22,6.00,7,0.0100,False,False,14.9644,25,21.88,1563508400.9617043

SIM_TRACE_LOG:0,41,7.7508,1.0111,0.2930,0.22,3.00,6,0.0100,False,False,15.2947,26,21.88,1563508401.0562809

SIM_TRACE_LOG:0,42,7.8397,1.0421,0.3133,0.22,3.00,6,0.0000,True,False,15.2947,26,21.88,1563508401.1350539

reward: 2173061.9472040334 Training> Name=main_level/agent, Worker=0, Episode=1, Total reward=2173061.94, Steps=42, Training iteration=0

I got one more step. But then it's just stuck long time and nothing happening. Also no error

crr0004 commented 5 years ago

Okay so it looks like it is working as intended. Can you post your robomaker.env file? Seems like your reward function is triggering an early exit due to having such a high value

spoecker commented 5 years ago

@crr0004 you are amazing, car is up and running again. Thanks a lot. With the new Repo download I replaced the file and lost the changes I made before.

dafrost22 commented 5 years ago

In case it helps someone else, this was also happening to be because the permissions of my "custom_files" folder in the minio bucket representation on the local filesystem had incorrect ownership