The self.worker parameter obtained by loading the checkpoint file has a value of None and how to train on GPU

MurrayMa0816 commented 1 year ago

Hello, the Mate you shared has been extremely helpful for my research. I am currently studying your code, but I have encountered a bug that I haven't been able to solve despite spending a lot of time on it. Could you please take a look and assist me?

Here is the process that triggered my issue:

When running train.py in the PSRO, it calls the train() function inside example/hrl/mappo/camera/train.py. It generates a checkpoint-1 at the following path:

mate/examples/psro/ray_results/debug/NE-camera.HRL-MAPPO-vs.-target.MAPPO/camera/00001/PSRO-camera.HRL-MAPPO/PSRO-camera.HRL-MAPPO-00001_0_2023-06-26_22-43-32/checkpoint_000001/checkpoint-1

Then, at line 91 of example/utils/rllib_policy.py:

self.checkpoint_path, self.worker, self.params = load_checkpoint(checkpoint_path)

By loading the aforementioned checkpoint-1, it retrieves self.worker. When setting the state of self.worker, the values of the parameters "fused" and "foreach" in self.worker['state']['shared_policy']['_optimizer_variables'][0]['param_groups'][0] are both None.

Subsequently, during the process of setting the state of self.worker, it eventually leads to line 161 in ray/rllib/utils/torch_utils.py:

tensor = torch.from_numpy(np.asarray(item))

When the values of fused and foreach are None and are being converted to a tensor as items, the following error occurs:

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

My approach has been:

Identify the location where checkpoint-1 is generated. When loading checkpoint-1, the values of foreach and fused in self.worker are None, suggesting that either these parameters were not present during the generation of checkpoint-1 or their values were intentionally set to None. I have gone through the code inside tune.run() line by line but have been unable to find the location where checkpoint-1 is generated. Therefore, I cannot confirm how foreach and fused were set when generating checkpoint-1.

2.Add parameters in the config file. Within example/hrl/mappo/camera/config.py, I added the parameters 'foreach': False and 'fused': True under config['model']['custom_model_config']. However, when loading checkpoint-1, the values of these two parameters remained as None.

These are the two approaches I have tried, but neither of them has resolved the issue. I would greatly appreciate any insights you can provide.

XuehaiPan commented 1 year ago

Hi @MurrayMa0816, have you ever tried to pin your ray version?

MurrayMa0816 commented 1 year ago

Hi, @XuehaiPan ,thank you for your response. I have tried two versions, Ray 1.12.0 and Ray 1.13.0, and encountered the same issue. Now, my workaround that doesn't solve the root problem is to check the value before converting the tensor. If the value is None, I convert it to False before converting it to a tensor. This way, I avoid the error. It is equivalent to setting the "fused" and "foreach" parameters to False forcefully. Although the program runs without errors, I'm not sure if this approach may introduce other potential issues.

XuehaiPan commented 1 year ago

@MurrayMa0816 Since only the policy weights are needed in the checkpoint, I think you can remove the optimizer-related items in the worker state:

self.worker['state']['shared_policy'].pop('_optimizer_variables', None)

MurrayMa0816 commented 1 year ago

@XuehaiPan , thank you very much for helping me confirm this issue that has been bothering me for a long time.

There is another issue during training that I would like to consult with you: When running the PSRO code, it always indicates that training is happening on the CPU, as follows:

NFO torch_policy.py:183 -- TorchPolicy (worker=local) running on CPU. However, I would like to train the policies on the GPU.

My hardware configuration: One computer with 16 CPUs and 1 GPU.

The settings for num_workers and num_gpus during training are as follows:

In psro/train.py, they are set as follows:

train_kwargs = { 
           'num_workers': num_workers, # Set 6 worker processes for the camera and target separately based on the number of CPUs available.
           'num_gpus': num_gpus, # 1 GPU, intending to share it among 12 workers.
           'num_envs_per_worker':  num_envs_per_worker, # 8 
           'seed': seed, } 
not_ready = [ 
         camera_trainer.result.remote(skip_train_if_exists=True,**train_kwargs), 
         target_trainer.result.remote(skip_train_if_exists=True, **train_kwargs), ]

However, when calling hrl/mappo/camera/train.py and mappo/target/train.py, you both set num_gpus_per_worker=0. Does this mean that GPU is not being used?

experiment.spec['config'].update( 
       num_cpus_for_driver=NUM_CPUS_FOR_TRAINER, 
       num_gpus=num_gpus, 
       num_gpus_per_worker=0, 
       num_workers=num_workers, 
       num_envs_per_worker=num_envs_per_worker, )

The code that indicates training on the CPU is in ray/rllib/policy/torch_policy, specifically:

worker_idx = self.config.get("worker_index", 0) 
if not config["_fake_gpus"] and ray.worker._mode() == ray.worker.LOCAL_MODE: 
        num_gpus = 0 
elif worker_idx == 0: 
        num_gpus = config["num_gpus"] 
else: 
        num_gpus = config["num_gpus_per_worker"] 
gpu_ids = list(range(torch.cuda.device_count()))
#Place on one or more CPU(s) when either:
#- Fake GPU mode.
#- num_gpus=0 (either set by the user or we are in local_mode=True).
#- No GPUs available.

if config["_fake_gpus"] or num_gpus == 0 or not gpu_ids:  
       logger.info( "TorchPolicy (worker={}) running on {}.".format( worker_idx if worker_idx > 0 else "local", "{} fake- 
       GPUs".format(num_gpus) if config["_fake_gpus"] else "CPU", ) )

Tried approaches:

I would like to have all 12 worker processes running on the GPU.

I found some information mentioning that one GPU cannot be shared among multiple workers.
There were also suggestions to add num_gpus=1 in the decorator when defining PlayerTrainer:

@ray.remote(max_restarts=1,num_gpus=1) 
class PlayerTrainer: 
def init( self, iteration, player, train_fn, base_experiment, opponent_agent_factory, from_checkpoint, timesteps_total, local_dir, project=None, group=None, **kwargs, ):

Additionally, in the provided running script, --num-gpus is set to 0.5.

python3 -m examples.psro.train
--project mate-psro
--meta-solver NE
--num-workers 32 --num-envs-per-worker 8 --num-gpus 0.5
--timesteps-total 5E6 --num-evaluation-episodes 10 --seed 0

The above mentioned approaches have not been able to solve the problem of training on GPU and made me confused. Could you please provide me with some suggestions for the configuration? Thank you very much.

Also, I would like to konow how long it takes to complete the PSRO training in your repository. Thank you very very much.

XuehaiPan commented 1 year ago

@MurrayMa0816

However, when calling hrl/mappo/camera/train.py and mappo/target/train.py, you both set num_gpus_per_worker=0. Does this mean that GPU is not being used?

There is a trainer process and several worker processes for rollout sampling. The configuration num_gpus=1 means the trainer uses a GPU for training the neural networks (actor & critic). And num_gpus_per_worker=0 means only using CPU to run the actor network during rollout sampling on the worker process. You can set num_gpus_per_worker>0 if you have more GPUs.

I found some information mentioning that one GPU cannot be shared among multiple workers.

If you set num_gpus_per_worker=0.1, the one GPU can be shared among 10 workers. You can find this on the ray documentation. You can increase the value if you have more GPUs.

There were also suggestions to add num_gpus=1 in the decorator when defining PlayerTrainer:

You will not need to do this. The resource requirements are set in the experiment configuration. If you decorate the PlayerTrainer with num_gpus=1, you mean the trainer and workers can only use one GPU in total.

Additionally, in the provided running script, --num-gpus is set to 0.5.

I set this because the network is relatively small. If you set num_gpus=0.5, you can run the MARL algorithm for both sides of the teams when you only have 1 GPU.

Also, I would like to konow how long it takes to complete the PSRO training in your repository. Thank you very very much.

Ideally, the PRSO algorithm requires the underlining MARL problem (e.g., MAPPO for the camera team only against fixed policy targets) to produce the best response (BR) to its opponent. So you need to wait for each PlayerTrainer process to converge. In the default configuration, each side of the team is trained for 5 million step experiences. On my side, each iteration will take approximately 2-5 hours. It depends on your environment configuration, such as how many agents in the environment.

MurrayMa0816 commented 1 year ago

@XuehaiPan Thank you very much for helping me solve each confusion. Thank you again.

MurrayMa0816 commented 1 year ago

Hi, @XuehaiPan , I apologize for the inconvenience, but I have a few questions I'd like to ask you. I noticed that when you were using the PSRO algorithm, each team was trained for 5 million steps, and each PlayerTrainer process converged within 2-5 hours. I'd like to know what kind of resources you were using, such as how many CPU cores and how much GPU memory? Currently, I'm using the code environment from your repository and the PSRO algorithm on a computer with a 16-core CPU. I allocate seven CPU cores to each team for generating rollouts and one for training. I also have a 16GB GPU, which I split between two teams at a 50% ratio. However, after more than 30 hours of training, the two teams combined have only trained for over 300,000 steps. This is far from the desired 5 million steps per iteration, considering the default setting of 40 iterations in total. The training time is becoming unmanageable. Therefore, I'd like to ask you about the resources you used to confirm whether the issue lies with my environment and code settings or if I need to allocate more computational resources. Thank you.

XuehaiPan commented 1 year ago

Hi @MurrayMa0816. The computation requirements in the experiment are specified as:

https://github.com/XuehaiPan/mate/blob/3e631c0c3b043990fc53ae5fc3a37b0f65f230c5/scripts/psro.sh#L3-L13

https://github.com/XuehaiPan/mate/blob/3e631c0c3b043990fc53ae5fc3a37b0f65f230c5/scripts/psro.sh#L52-L56

Each team is trained with 32 workers (each use 1 CPU). The total CPU cores are around 80 for the PSRO algorithm. The GPU memory consumption is relatively small because the network is not very deep (if you do not assign any GPU resource to the rollout worker and only the trainer has GPUs). Maybe you need to setup a ray cluster to use more CPUs from other nodes to speed up your training.

MurrayMa0816 commented 1 year ago

Hi @XuehaiPan I see. Thanks again for your kind reply.

XuehaiPan / mate

The self.worker parameter obtained by loading the checkpoint file has a value of None and how to train on GPU #5