-
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exc…
-
Hi, I managed to train step 1 and step 2 for a 6.7B actor model and 350m reward model but I keep running into an out of memory issue for step 3. I was wondering what config was used in your tests with…
-
大佬您好:
我想请教一下vae在actor-critic网络中间起到的作用是什么,去掉之后会怎么样?
-
Hello, I need to make SacAgent work with discrete action, so try to implement GumbelSoftmax parameterization trick by re-defining the relevant classes. However, the calculation of `agent.train(experie…
-
사용한 Policy Gradient의 명칭을 알려주실수 있나요?
actor-critic 방식이 아닌것같아 질문드립니다.
-
在运行书中DDPG的参考代码时,观察到随着训练的进行,actorloss 的值在不断的上升,criticloss的值也是上下飘忽不定
actorloss 的定义不是-q值吗?现在这个值越来越大不就意味着动作的q值越来越小吗?这不是与我们想最大化动作的q值的目的相反吗?
所以评价这个算法的好坏最终是要看他的奖励是否上升吗?
-
Anyone know what's wrong of using tf-agent here which trigger the ValueError?
ValueError: Inputs to TanhNormalProjectionNetwork must match the sample_spec.dtype.
In call to configurable 'SacAgent'…
-
* paper
Addressing Function Approximation Error in Actor-Critic Methods
https://arxiv.org/abs/1802.09477
-
I am trying to run DeepSpeed-Chat Example with single gpu, Nvidia A6000 48G.
I could run all 3 steps well using 1.3b example.
But when I run `single_gpu/run_6.7b_lora.sh`, I got CUDA Out Of Memory…
-