stable-baselines Search Results

1000+ results
for stable-baselines

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

hill-a/stable-baselines #738

"Getting Mean Reward in CustomCallBack" Unsupported operand …

**Describe the bug** In the CustomCallback, getting the mean reward causes a numpy Error: > TypeError: unsupported operand type(s) for /: 'str' and 'int' The values are: ```python x, y = t…

toksis updated 4 years ago
2
hyrise/rl_index_selection #3

Running error ：cant find file

hi,thank you for your patence: how to generate this file? Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/tensorflow1/lib/python3.7/runpy.py", line 193, in _run_module_as_mai…

MaYangrui6 updated 7 months ago
1
smritae01/CS640-Originality-Score-Project #4

tune hyperparameters for RLHF model

Increase the training iterations: Train the PPO model for more iterations, as the model might not have converged yet. Adjust the PPO hyperparameters: Experiment with different hyperparameters such …

GrantorShadow updated 1 year ago
2
Stable-Baselines-Team/stable-baselines3-contrib #158

SIL

[Self Imitation Learning](https://arxiv.org/abs/1806.05635) @emrul has implemented SAIL, see https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/139#issuecomment-1445114579 @em…

qgallouedec updated 1 year ago
8
hill-a/stable-baselines #666

[question] HER does not sample very last state of episode as…

It seems to me that when HER samples an achieved goal from the replay buffer it never samples the very last state of the episode. Is this intended? As a consequence, the sampling strategy "final" …

nicoguertler updated 4 years ago
3
araffin/rl-baselines-zoo #59

TRPO "underflow encountered in multiply"

While running a TRPO train, after some time (random - anywhere from 15sec to 1min) it kicks with the following: `Traceback (most recent call last): File "callback.py", line 196, in model.lea…

jarlva updated 4 years ago
2
hill-a/stable-baselines #545

DDPG log output uses scientific notation too soon for episod…

Here's an example intermittent print out from DDPG: ``` -------------------------------------- | reference_Q_mean | 49.8 | | reference_Q_std | 6.61 | | reference_action_m…

jkterry1 updated 5 years ago
1
hill-a/stable-baselines #1096

GAIL throws error when obs space is MultiDiscrete

**Describe the bug** https://stable-baselines.readthedocs.io/en/master/modules/gail.html states that GAIL supports MultiDiscrete obs space, but https://github.com/hill-a/stable-baselines/blob/maste…

SurferZergy updated 3 years ago
1
evilsocket/pwnagotchi #166

find a way to optimize AI loading times

TensorFlow takes minutes to import on a Raspberry Pi Zero W and that's probably because of the huge .so file with native primitives it has to load, among other things. Given the nature of the project,…

evilsocket updated 5 years ago
6
hill-a/stable-baselines #781

[PPO2] problems resuming training

I'm trying to resume the model training and I'm getting some strange results. Using SubProcVecEnv and VecNormalize on a custom environment: ``` from stable_baselines.common.policies import MlpPoli…

k0rean updated 9 months ago
5

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for stable-baselines

1000+ results
for stable-baselines