actor-critic Search Results

1000+ results
for actor-critic

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

gsyyysg/StockFormer #18

强化学习阶段加载模型报错“Fail to load agent!”，是因为内存不够还是需要修改代码啊？

/content/gdrive/MyDrive/Colab/StockFormer-main/code/stable_baselines3/common/save_util.py:166: UserWarning: Could not deserialize object action_space. Consider using custom_objects argument to replace…

yo-yoo updated 4 weeks ago
1
l294265421/alpaca-rlhf #11

element 0 of tensors does not require grad and does not have…

我的运行脚本如下： CUDA_VISIBLE_DEVICES=0,1,2,3 deepspeed /data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py --data_path /data/bill.bi/RLHFDataset --data_output_path /…

Bill-Orz updated 1 year ago
5
chainer/chainerrl #64

The Reactor: A Sample-Efficient Actor-Critic Architecture […

The Reactor: A Sample-Efficient Actor-Critic Architecture https://arxiv.org/abs/1704.04651

ethancaballero updated 5 years ago
1
SamsungLabs/tqc_pytorch #8

Incompatible with PyTorch 2.0 - variable modified inplace

Getting the following error when trying to run the code with a (very simple) custom env using PyTorch 2.0.1: `RuntimeError: one of the variables needed for gradient computation has been modified by…

maxweissenbacher updated 8 months ago
1
microsoft/DeepSpeedExamples #512

In Step3, RuntimeError:RewardModel:size mismatch for rwtranr…

1. In SFT step, the model I used is [llama-7b-hf](https://huggingface.co/decapoda-research/llama-7b-hf) download from hugging face, and all datasets are default. here is my launch shell: ```shell …

KyrieXu11 updated 1 year ago
6
songwenas12/fjsp-drl #8

actor 和 critic 网络没有forward 是如何训练的？看不懂

请问在PPO_model.py 文件里，forward 是空的，为什么可以通过evaluate 函数实现呢？实在没搞懂这样的话HGNNScheduler 网络里的 actor 和 critic 是怎么训练的？ evaluate 函数，里面使用了 actor 和 critic ，那actor 网络的含义是什么啊？初始化的输出只有1维，如何输出 action的分布呢？是通过里面的…

whwyhwy updated 6 months ago
1
MishaLaskin/curl #20

Optimising encoder twice during CURL?

Thanks for sharing your code, it's great to be able to go through the implementation. Maybe I'm misunderstanding this, but it seem that if you intend `self.cpc_optimizer` to only optimise W, then …

wassname updated 8 months ago
9
dionhaefner/blog-comments #5

2021/04/yahtzotron-learning-to-play-yahtzee-with-advantage-a…

# Learning to play Yahtzee with Advantage Actor-Critic (A2C) | dionhaefner.github.io My in-laws are really into the dice game Yatzy (the Scandinavian version of Yahtzee). If you’re unfamiliar with th…

utterances-bot updated 2 years ago
1
quantuminformation/youtube-space-invaders #11

Add reward/loss over time graphs for actor and critic networ…

The actor reward graph should display both the predicted loss generated by the critic network (equivalent to the actor optimization loss) and the actual loss once the training episode is complete.

generic-github-user updated 5 years ago
2
thu-ml/tianshou #1142

How can I make action sampling within the range specified by…

Hi, I am new to tianshou and RL. I created a env and used ppo in tianshou to run. But I found the action sampling is out of range. So I searched for, and I found map_action. But it seem not used in tr…

lidaken updated 5 months ago
6

上一页 1...8 9 10 11 12 13 14...100 下一页

1000+ results for actor-critic

1000+ results
for actor-critic