PantheonRL is a package for training and testing multi-agent reinforcement learning environments. PantheonRL supports cross-play, fine-tuning, ad-hoc coordination, and more.
Hi, Could you please offer an example of bctrainer.py. I input the trajs file collected from web user interface but meet an error:
Traceback (most recent call last):
File "/Users/alexwang/phD/PantheonRL/bctrainer.py", line 106, in
clone.train(n_epochs=args.total_epochs)
File "/Users/alexwang/phD/PantheonRL/pantheonrl/algos/bc.py", line 352, in train
batch["obs"], batch["acts"])
File "/Users/alexwang/phD/PantheonRL/pantheonrl/algos/bc.py", line 291, in _calculateloss
, log_prob, entropy = self.policy.evaluate_actions(obs, acts)
File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/stable_baselines3/common/policies.py", line 640, in evaluate_actions
log_prob = distribution.log_prob(actions)
File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/stable_baselines3/common/distributions.py", line 278, in log_prob
return self.distribution.log_prob(actions)
File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/torch/distributions/categorical.py", line 117, in log_prob
self._validate_sample(value)
File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/torch/distributions/distribution.py", line 277, in _validate_sample
raise ValueError('The value argument must be within the support')
ValueError: The value argument must be within the support
My torch version is 1.8.1, stable-baselines3 version is 1.2.0a0
Hi, Could you please offer an example of bctrainer.py. I input the trajs file collected from web user interface but meet an error:
Traceback (most recent call last): File "/Users/alexwang/phD/PantheonRL/bctrainer.py", line 106, in
clone.train(n_epochs=args.total_epochs)
File "/Users/alexwang/phD/PantheonRL/pantheonrl/algos/bc.py", line 352, in train
batch["obs"], batch["acts"])
File "/Users/alexwang/phD/PantheonRL/pantheonrl/algos/bc.py", line 291, in _calculateloss
, log_prob, entropy = self.policy.evaluate_actions(obs, acts)
File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/stable_baselines3/common/policies.py", line 640, in evaluate_actions
log_prob = distribution.log_prob(actions)
File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/stable_baselines3/common/distributions.py", line 278, in log_prob
return self.distribution.log_prob(actions)
File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/torch/distributions/categorical.py", line 117, in log_prob
self._validate_sample(value)
File "/Users/alexwang/miniconda/envs/PantheonRL/lib/python3.7/site-packages/torch/distributions/distribution.py", line 277, in _validate_sample
raise ValueError('The value argument must be within the support')
ValueError: The value argument must be within the support
My torch version is 1.8.1, stable-baselines3 version is 1.2.0a0
Thanks