advantage-actor-critic Search Results

291 results
for advantage-actor-critic

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/contrib #30

SWA wasn't applied to param {}; skipping it".format(p))

hello, I get this error have no clue what I am doing wrong. Here's my code ``` for _ in range(self.K_epochs): # Evaluating old actions and values : …

murtazabasu updated 4 years ago
3
microsoft/DeepSpeedExamples #337

DeepSpeed-Chat: prefetch of layers during reward model forwa…

When running step 3 with ZERO stage 3 enabled for both the actor and critic models, I get the following error (line numbers may be offset due to debug statements I've added): ``` File "/path/DeepSp…

adammoody updated 1 year ago
27
hakuhodo-technologies/scope-rl #24

[Questions] how can we generate offline dataset like D4RL?

Thanks for your outstanding work. I would like to ask: how should we generate offline datasets, such as medium or medium-expert version, like D4RL. Also is it possible to render states into images to …

return-sleep updated 7 months ago
1
rlcode/reinforcement-learning #48

Failing to converge with increase in grid-size (Grid World)

If I increase both the HEIGHT and WIDTH from 5 to 10 keeping the obstacles and the final goal at the same position, Deep SARSA network doesn't seem to converge. What do you think is the problem? Shoul…

akileshbadrinaaraayanan updated 7 years ago
5
thu-ml/tianshou #1029

Wrong output of forward for custom policy

- [ ] I have marked all applicable categories: + [x] exception-raising bug + [x] RL algorithm bug + [ ] documentation request (i.e. "X is missing from the documentation.") + [ ] ne…

hazel260802 updated 6 months ago
1
hongzimao/decima-sim #13

a question about loss founction

Hi, Here are two part loss in actor agent : adv loss and entropy loss, can you tell me why you add the entropy loss? I know the entropy weight decreased from 1 to 0.0001, but I do not know why yo…

CookieYo updated 4 years ago
1
cosmicBboy/ml-research #26

[metalearn] neurips bbo challenge idea dump

Noting these down for the [neurips bbo challenge](http://bbochallenge.com/leaderboard) - idea 1: generate more suggestions and only send the top `n_suggestions` ranked by value. - idea 2: gener…

cosmicBboy updated 3 years ago
4
Dronie/D2D_A2C #2

An error occurred when I executed this code

self.Critic_return, self.advantage = trfl.sequence_advantage_critic_loss(self.baseline_, self.reward_, self.discount_, self.bootstrap_, lambda_=lambda_, …

Gavin-cy updated 5 months ago
8
nikhilbarhate99/PPO-PyTorch #38

Discounted Reward Calulcation (Generalized Advantage Estimat…

I want to ask one more thing about the estimation of discounted reward. The variable discounted reward always starts with zero. However, if the episode is not ended, should it be the value estimation …

artest08 updated 9 months ago
5
huggingface/deep-rl-class #394

Translating to Russian

Hi! Let's bring the reinforcement learning course to all the Russian-speaking community 🌏 Would you want to translate? Please follow the 🤗 [TRANSLATING guide](https://github.com/huggingface/tran…

blademoon updated 6 months ago
53

上一页 1...1 2 3 4 5 6 7...30 下一页

291 results for advantage-actor-critic

291 results
for advantage-actor-critic