Sohojoe / MarathonEnvsBaselines

Experimental - using OpenAI baselines with MarathonEnvs (ML-Agents)
Apache License 2.0
19 stars 3 forks source link

Error in initializing the fully-connected layer #7

Open maystroh opened 5 years ago

maystroh commented 5 years ago

I'm trying to use your code to run the baselines algos of openai with a unity3D environment. Here is the command I'm using to launch the training:

python -m baselines.run_unity --alg=ppo2 --env=./envs/env.x86_64 --num_timesteps=1e6 --save_path=./models/test

Here is the problem I'm getting:

File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/hassan/Desktop/Unity-Gym/baselines/run_unity.py", line 248, in main() File "/home/hassan/Desktop/Unity-Gym/baselines/run_unity.py", line 222, in main model, env = train(args, extra_args) File "/home/hassan/Desktop/Unity-Gym/baselines/run_unity.py", line 79, in train alg_kwargs File "/home/hassan/Desktop/Unity-Gym/baselines/ppo2/ppo2.py", line 305, in learn model = make_model() File "/home/hassan/Desktop/Unity-Gym/baselines/ppo2/ppo2.py", line 304, in max_grad_norm=max_grad_norm) File "/home/hassan/Desktop/Unity-Gym/baselines/ppo2/ppo2.py", line 39, in init act_model = policy(nbatch_act, 1, sess) File "/home/hassan/Desktop/Unity-Gym/baselines/common/policies.py", line 142, in policy_fn policy_latent = policy_network(encoded_x) File "/home/hassan/Desktop/Unity-Gym/baselines/common/models.py", line 52, in network_fn h = fc(h, 'mlp_fc{}'.format(i), nh=num_hidden, init_scale=np.sqrt(2)) File "/home/hassan/Desktop/Unity-Gym/baselines/a2c/utils.py", line 65, in fc w = tf.get_variable("w", [nin, nh], initializer=ortho_init(init_scale)) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1467, in get_variable aggregation=aggregation) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1217, in get_variable aggregation=aggregation) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 527, in get_variable aggregation=aggregation) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 481, in _true_getter aggregation=aggregation) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 903, in _get_single_variable aggregation=aggregation) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2443, in variable aggregation=aggregation) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2425, in previous_getter = lambda kwargs: default_variable_creator(None, **kwargs) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2406, in default_variable_creator constraint=constraint) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 259, in init constraint=constraint) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 368, in _init_from_args initial_value(), name="initial_value", dtype=dtype) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 885, in shape.as_list(), dtype=dtype, partition_info=partition_info) File "/home/hassan/Desktop/Unity-Gym/baselines/a2c/utils.py", line 35, in _orthoinit u, , v = np.linalg.svd(a, full_matrices=False) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/numpy/linalg/linalg.py", line 1368, in svd _assertNoEmpty2d(a) File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/numpy/linalg/linalg.py", line 226, in _assertNoEmpty2d raise LinAlgError("Arrays cannot be empty") numpy.linalg.linalg.LinAlgError: Arrays cannot be empty

The problem is coming from the function ortho_init in utils.py. Please check below for more details.

def ortho_init(scale=1.0):
    def _ortho_init(shape, dtype, partition_info=None):
        #lasagne ortho init for tf
        print(shape) #(0, 64)
        shape = tuple(shape)
        print(shape) #(0, 64)
        if len(shape) == 2:
            flat_shape = shape
        elif len(shape) == 4: # assumes NHWC
            flat_shape = (np.prod(shape[:-1]), shape[-1])
        else:
            raise NotImplementedError
        print(flat_shape) #(0, 64)
        a = np.random.normal(0.0, 1.0, flat_shape) # The output of this method is an empty array.
        print(a)
        u, _, v = np.linalg.svd(a, full_matrices=False)
        q = u if u.shape == flat_shape else v # pick the one with the correct shape
        q = q.reshape(shape)
        return (scale * q[:shape[0], :shape[1]]).astype(np.float32)
    return _ortho_init

I just wanted to report it here in case I can get some help ASAP. I will continue my inquiries and will post more details once I fix it.

maystroh commented 5 years ago

I've fixed it.. It was just that I used the wrong method. I should use this method env = make_unity_env(env_id, args.num_env or 1, args.visual_obs) instead of env = make_vec_env(env_id, env_type, args.num_env or 1, seed, reward_scale=args.reward_scale)

def build_env(args):
    ncpu = multiprocessing.cpu_count()
    if sys.platform == 'darwin': ncpu //= 2
    nenv = args.num_env or ncpu
    alg = args.alg
    rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
    seed = args.seed

    env_type, env_id = get_env_type(args.env)

    if env_type == 'atari':
        ..

    elif env_type == 'retro':
        ..

    elif env_type == 'unity':
       get_session(tf.ConfigProto(allow_soft_placement=True,
                                   intra_op_parallelism_threads=1,
                                   inter_op_parallelism_threads=1))
       # env = make_vec_env(env_id, env_type, args.num_env or 1, seed, reward_scale=args.reward_scale)
       env = make_unity_env(env_id, args.num_env or 1, args.visual_obs)
    #    env = VecNormalize(env)
Sohojoe commented 5 years ago

Hi @maystroh - I'm glad you figured it out.

It would be good to get your feedback as at some point, it would be good to fold the baselines capabilities back into ml-agents.

For me, I was exploring baselines to see if I could speed up training over ml-agents (leverage the gpu etc) - however, I found that

I've also been trying to get HER working, however, the baselines HER code is quite deeply coupled with MPI and Mujoco - so I'm thinking it may be faster to try implementing DDQN = HER into ml-agents

maystroh commented 5 years ago

Actually, I have the same goal: exploring the baselines to double check if their PPO implementation is really optimized for GPU. Since I'm working with visual observation, the trainings are expected to be faster with GPU no? Just to verify your outcomes: gpu was slower than cpu for both vector and visual observations?

So far, I'm working with only one agent but will try double check this case once I finish what I have to accomplish. I will update this thread with whatever interesting info I find during my exploration.

Sohojoe commented 5 years ago

I have not tried visual observations yet - so I'm interested to know how this works out!!

Yes, I can confirm that CPU was fast than GPU in my tests. My state/action spaces are small and also my buffer size is small(ish) so this may be contributing to why that was the case. Note: I'm using an optimized version of tensorflow - here are some links: