Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
7.3k stars 812 forks source link

[Bug Report] "TypeError: 'float' object does not support indexing" in step method after 12M steps of training. #1114

Open jng164 opened 3 months ago

jng164 commented 3 months ago

Describe the bug

Describe the bug The code suddenly reaches a "TypeError" when calling the step method after 12M steps of training.

Code example

I am using gym.vector.AsyncVectorEnv(). I use the function make_envto create my environments.

def make_env(gym_id, seed, idx, capture_video, run_name, qubits, depth):

    def thunk():
        env = gym.make(gym_id, qubits=qubits, depth=depth, env_id=idx)
        env = gym.wrappers.RecordEpisodeStatistics(env)
        if capture_video and idx == 0:
            env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
        return env

    return thunk

The main part of the code is as follows:

if __name__ == "__main__":
    mp.set_start_method('spawn')
    device = torch.device("cuda" if torch.cuda.is_available() and args.cuda else "cpu")
    envs = gym.vector.AsyncVectorEnv(
        [make_env(args.gym_id, args.seed + i, i, args.capture_video, run_name, qubits, depth) for i in range(args.num_envs)],
    shared_memory=False)
    agent = AgentGNN(envs, device).to(device)#Graph Neural Network
    for update in range(1, num_updates + 1):
        for step in range(args.num_steps):  
            global_step += 1 * args.num_envs
            dones[step] = next_done
            try:
                with torch.no_grad():
                    action, logprob, _, value, logits, action_ids = agent.get_action_and_value(next_obs_graph, device=device)
                    values[step] = value.flatten()
                actions[step] = action
                logprobs[step] = logprob

                next_obs, reward, done, deprecated, info = envs.step(action_ids.cpu().numpy()) 
            except TypeError as e:
                print(f"Error: {e}")
            rewards[step] = torch.tensor(reward).to(device).view(-1)

            next_done = torch.Tensor(done).to(device)

As far as I understand the error, this code generates as much threads as environments I want. In one particular thread , the agent breaks in env.step(). As you can see, I tried to solve this issue with a try-except, but this does not work. I think this can be because the thread just keeps on hold until it breaks but I am not sure.

Traceback ERROR: Received the following error from Worker-31: TypeError: 'float' object does not support indexing ERROR: Shutting down Worker-31. ERROR: Raising the last exception back to the main process. Traceback (most recent call last): File "/home/jan.nogue/RL-ZX/Copt-cquere/rl-zx/ppo_async.py", line 198, in next_obs, reward, done, deprecated, info = envs.step(action_ids.cpu().numpy()) File "/home/jan.nogue/miniconda3/envs/gymnasium/lib/python3.10/site-packages/gymnasium/vector/vector_env.py", line 203, in step return self.step_wait() File "/home/jan.nogue/miniconda3/envs/gymnasium/lib/python3.10/site-packages/gymnasium/vector/async_vector_env.py", line 333, in step_wait self._raise_if_errors(successes) File "/home/jan.nogue/miniconda3/envs/gymnasium/lib/python3.10/site-packages/gymnasium/vector/async_vector_env.py", line 544, in _raise_if_errors raise exctype(value) TypeError: 'float' object does not support indexing /home/jan.nogue/miniconda3/envs/gymnasium/lib/python3.10/site-packages/gymnasium/vector/async_vector_env.py:460: UserWarning: WARN: Calling close while waiting for a pending call to step to complete.



### System info

I use gym 0.26.2, torch 2.0.1 and python 3.10.14. I am using Ubuntu 24.04 LTS. All of the packages were installed using pip. 

### Additional context

_No response_

### Checklist

- [X] I have checked that there is no similar [issue](https://github.com/Farama-Foundation/Gymnasium/issues) in the repo
pseudo-rnd-thoughts commented 3 months ago

Thanks for your issue however I'm uncertain how this is directly related to Gymnasium. If you could point out a piece of code within Gymnasium that is causing the issue that would be helpful. I'm guessing that it is not within the AsyncVectorEnv but the environment as after 12 million steps seems a bit random.

Due to AsyncVectorEnv asynchronous's nature that discovering environment is highly difficult Could you run using SyncVectorEnv and when the error occurs then it should print the whole stack trace

pseudo-rnd-thoughts commented 3 months ago

I've created #1119 to help with this problem if you can use it