Open MurrayMa0816 opened 1 year ago
Hi @MurrayMa0816, have you ever tried to pin your ray
version?
Hi, @XuehaiPan ,thank you for your response. I have tried two versions, Ray 1.12.0 and Ray 1.13.0, and encountered the same issue. Now, my workaround that doesn't solve the root problem is to check the value before converting the tensor. If the value is None, I convert it to False before converting it to a tensor. This way, I avoid the error. It is equivalent to setting the "fused" and "foreach" parameters to False forcefully. Although the program runs without errors, I'm not sure if this approach may introduce other potential issues.
@MurrayMa0816 Since only the policy weights are needed in the checkpoint, I think you can remove the optimizer-related items in the worker state:
self.worker['state']['shared_policy'].pop('_optimizer_variables', None)
@XuehaiPan , thank you very much for helping me confirm this issue that has been bothering me for a long time.
There is another issue during training that I would like to consult with you: When running the PSRO code, it always indicates that training is happening on the CPU, as follows:
NFO torch_policy.py:183 -- TorchPolicy (worker=local) running on CPU.
However, I would like to train the policies on the GPU.
My hardware configuration: One computer with 16 CPUs and 1 GPU.
The settings for num_workers and num_gpus during training are as follows:
In psro/train.py, they are set as follows:
train_kwargs = {
'num_workers': num_workers, # Set 6 worker processes for the camera and target separately based on the number of CPUs available.
'num_gpus': num_gpus, # 1 GPU, intending to share it among 12 workers.
'num_envs_per_worker': num_envs_per_worker, # 8
'seed': seed, }
not_ready = [
camera_trainer.result.remote(skip_train_if_exists=True,**train_kwargs),
target_trainer.result.remote(skip_train_if_exists=True, **train_kwargs), ]
However, when calling hrl/mappo/camera/train.py and mappo/target/train.py, you both set num_gpus_per_worker=0. Does this mean that GPU is not being used?
experiment.spec['config'].update(
num_cpus_for_driver=NUM_CPUS_FOR_TRAINER,
num_gpus=num_gpus,
num_gpus_per_worker=0,
num_workers=num_workers,
num_envs_per_worker=num_envs_per_worker, )
The code that indicates training on the CPU is in ray/rllib/policy/torch_policy, specifically:
worker_idx = self.config.get("worker_index", 0)
if not config["_fake_gpus"] and ray.worker._mode() == ray.worker.LOCAL_MODE:
num_gpus = 0
elif worker_idx == 0:
num_gpus = config["num_gpus"]
else:
num_gpus = config["num_gpus_per_worker"]
gpu_ids = list(range(torch.cuda.device_count()))
#Place on one or more CPU(s) when either:
#- Fake GPU mode.
#- num_gpus=0 (either set by the user or we are in local_mode=True).
#- No GPUs available.
if config["_fake_gpus"] or num_gpus == 0 or not gpu_ids:
logger.info( "TorchPolicy (worker={}) running on {}.".format( worker_idx if worker_idx > 0 else "local", "{} fake-
GPUs".format(num_gpus) if config["_fake_gpus"] else "CPU", ) )
Tried approaches:
I would like to have all 12 worker processes running on the GPU.
num_gpus=1
in the decorator when defining PlayerTrainer: @ray.remote(max_restarts=1,num_gpus=1)
class PlayerTrainer:
def init( self, iteration, player, train_fn, base_experiment, opponent_agent_factory, from_checkpoint, timesteps_total, local_dir, project=None, group=None, **kwargs, ):
--num-gpus
is set to 0.5. python3 -m examples.psro.train
--project mate-psro
--meta-solver NE
--num-workers 32 --num-envs-per-worker 8 --num-gpus 0.5
--timesteps-total 5E6 --num-evaluation-episodes 10 --seed 0
The above mentioned approaches have not been able to solve the problem of training on GPU and made me confused. Could you please provide me with some suggestions for the configuration? Thank you very much.
Also, I would like to konow how long it takes to complete the PSRO training in your repository. Thank you very very much.
@MurrayMa0816
However, when calling hrl/mappo/camera/train.py and mappo/target/train.py, you both set num_gpus_per_worker=0. Does this mean that GPU is not being used?
There is a trainer process and several worker processes for rollout sampling. The configuration num_gpus=1
means the trainer uses a GPU for training the neural networks (actor & critic). And num_gpus_per_worker=0
means only using CPU to run the actor network during rollout sampling on the worker process. You can set num_gpus_per_worker>0
if you have more GPUs.
- I found some information mentioning that one GPU cannot be shared among multiple workers.
If you set num_gpus_per_worker=0.1
, the one GPU can be shared among 10 workers. You can find this on the ray
documentation. You can increase the value if you have more GPUs.
- There were also suggestions to add num_gpus=1 in the decorator when defining PlayerTrainer:
You will not need to do this. The resource requirements are set in the experiment configuration. If you decorate the PlayerTrainer
with num_gpus=1
, you mean the trainer and workers can only use one GPU in total.
Additionally, in the provided running script, --num-gpus is set to 0.5.
I set this because the network is relatively small. If you set num_gpus=0.5
, you can run the MARL algorithm for both sides of the teams when you only have 1 GPU.
Also, I would like to konow how long it takes to complete the PSRO training in your repository. Thank you very very much.
Ideally, the PRSO algorithm requires the underlining MARL problem (e.g., MAPPO for the camera team only against fixed policy targets) to produce the best response (BR) to its opponent. So you need to wait for each PlayerTrainer
process to converge. In the default configuration, each side of the team is trained for 5 million step experiences. On my side, each iteration will take approximately 2-5 hours. It depends on your environment configuration, such as how many agents in the environment.
@XuehaiPan Thank you very much for helping me solve each confusion. Thank you again.
Hi, @XuehaiPan , I apologize for the inconvenience, but I have a few questions I'd like to ask you. I noticed that when you were using the PSRO algorithm, each team was trained for 5 million steps, and each PlayerTrainer process converged within 2-5 hours. I'd like to know what kind of resources you were using, such as how many CPU cores and how much GPU memory? Currently, I'm using the code environment from your repository and the PSRO algorithm on a computer with a 16-core CPU. I allocate seven CPU cores to each team for generating rollouts and one for training. I also have a 16GB GPU, which I split between two teams at a 50% ratio. However, after more than 30 hours of training, the two teams combined have only trained for over 300,000 steps. This is far from the desired 5 million steps per iteration, considering the default setting of 40 iterations in total. The training time is becoming unmanageable. Therefore, I'd like to ask you about the resources you used to confirm whether the issue lies with my environment and code settings or if I need to allocate more computational resources. Thank you.
Hi @MurrayMa0816. The computation requirements in the experiment are specified as:
Each team is trained with 32 workers (each use 1 CPU). The total CPU cores are around 80 for the PSRO algorithm. The GPU memory consumption is relatively small because the network is not very deep (if you do not assign any GPU resource to the rollout worker and only the trainer has GPUs). Maybe you need to setup a ray cluster to use more CPUs from other nodes to speed up your training.
Hi @XuehaiPan I see. Thanks again for your kind reply.
Hello, the Mate you shared has been extremely helpful for my research. I am currently studying your code, but I have encountered a bug that I haven't been able to solve despite spending a lot of time on it. Could you please take a look and assist me?
Here is the process that triggered my issue:
By loading the aforementioned checkpoint-1, it retrieves self.worker. When setting the state of self.worker, the values of the parameters "fused" and "foreach" in self.worker['state']['shared_policy']['_optimizer_variables'][0]['param_groups'][0] are both None.
When the values of fused and foreach are None and are being converted to a tensor as items, the following error occurs:
My approach has been:
2.Add parameters in the config file. Within example/hrl/mappo/camera/config.py, I added the parameters 'foreach': False and 'fused': True under config['model']['custom_model_config']. However, when loading checkpoint-1, the values of these two parameters remained as None.
These are the two approaches I have tried, but neither of them has resolved the issue. I would greatly appreciate any insights you can provide.