-
/content/gdrive/MyDrive/Colab/StockFormer-main/code/stable_baselines3/common/save_util.py:166: UserWarning: Could not deserialize object action_space. Consider using custom_objects argument to replace…
-
我的运行脚本如下:
CUDA_VISIBLE_DEVICES=0,1,2,3 deepspeed /data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py --data_path /data/bill.bi/RLHFDataset --data_output_path /…
-
The Reactor: A Sample-Efficient Actor-Critic Architecture
https://arxiv.org/abs/1704.04651
-
Getting the following error when trying to run the code with a (very simple) custom env using PyTorch 2.0.1:
`RuntimeError: one of the variables needed for gradient computation has been modified by…
-
1. In SFT step, the model I used is [llama-7b-hf](https://huggingface.co/decapoda-research/llama-7b-hf) download from hugging face, and all datasets are default. here is my launch shell:
```shell
…
-
请问在PPO_model.py 文件里,forward 是空的,为什么可以通过evaluate 函数实现呢? 实在没搞懂 这样的话HGNNScheduler 网络里的 actor 和 critic 是怎么训练的?
evaluate 函数,里面使用了 actor 和 critic ,那actor 网络的含义是什么啊? 初始化的输出只有1维,如何输出 action的 分布呢? 是通过里面的…
-
Thanks for sharing your code, it's great to be able to go through the implementation.
Maybe I'm misunderstanding this, but it seem that if you intend `self.cpc_optimizer` to only optimise W, then
…
-
# Learning to play Yahtzee with Advantage Actor-Critic (A2C) | dionhaefner.github.io
My in-laws are really into the dice game Yatzy (the Scandinavian version of Yahtzee). If you’re unfamiliar with th…
-
The actor reward graph should display both the predicted loss generated by the critic network (equivalent to the actor optimization loss) and the actual loss once the training episode is complete.
-
Hi, I am new to tianshou and RL. I created a env and used ppo in tianshou to run. But I found the action sampling is out of range. So I searched for, and I found map_action. But it seem not used in tr…