policy-learning Search Results

1000+ results
for policy-learning

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

eip-work/kuboard-press #144

[Vssue]/learning/k8s-advanced/policy/rq.html

https://kuboard.cn/learning/k8s-advanced/policy/rq.html

shaohq updated 5 years ago
1
eip-work/kuboard-press #147

[Vssue]/learning/k8s-advanced/policy/sec.html

https://kuboard.cn/learning/k8s-advanced/policy/sec.html

shaohq updated 5 years ago
1
arXivTimes/arXivTimes #24

Safe and efficient off-policy reinforcement learning

## 一言でいうと returnベースの方策オフ強化学習における安全で効率的なアルゴリズムの提案。安全とは、方策の"オフ具合"に対して性能がロバストであること。効率的とは、学習効率が良いこと。収束性の保証と実験を与えた。NIPS 2016に通っていて、真面目に解析を読むのはつらそう。 ### 論文リンク https://arxiv.org/abs/1606.02647 ### 著者…

alb-ktaka updated 6 years ago
1
utiasDSL/gym-pybullet-drones #246

Observation space bound

I would like to ask about the upper and lower bounds of the obs space in `BaseRLAviary.py`, I. noticed that the bounds are - and + infinity, does not that make the state space very huge to be explored…

Fatimah-Alahmed updated 6 days ago
1
huggingface/huggingface-llama-recipes #73

DPO Fine-Tuning

The repository contains examples to fine-tune the model using Supervised Fine Tuning. I wish to add examples of Transformer Reinforcement Learning (TRL) particulary [Direct Policy Optimization (DPO)](…

AnirudhJM24 updated 1 month ago
1
TMats/survey #67

Reinforcement Learning with Deep Energy-Based Policies

https://arxiv.org/abs/1702.08165

TMats updated 7 years ago
1
shufangxun/LLaVA-MoD #5

CUDA OOM issues

Hello, I've been trying to qwen2 0.5B and tinyclip using the repository, but I'm running into CUDA OOM issues on the dense2dense distillation step. Im running on 4 80GB A100s, I was wondering if I …

pumetu updated 2 weeks ago
3
ray-project/ray #46690

CI test linux://rllib:learning_tests_multi_agent_cartpole_dq…

CI test **linux://rllib:learning_tests_multi_agent_cartpole_dqn_gpu** is flaky. Recent failures: - https://buildkite.com/ray-project/postmerge/builds/5494#0190c314-728f-42a8-a960-af20a90ba259 DataC…

can-anyscale updated 2 months ago
32
thu-ml/tianshou #877

puzzle about policy learning of offline RL algorithms

- [ ] I have marked all applicable categories: + [ ] exception-raising bug + [ ] RL algorithm bug + [ ] documentation request (i.e. "X is missing from the documentation.") + [ ] ne…

GongYanfu updated 1 year ago
1
HesitantlyHuman/autoclip #9

Interaction between AutoClip and learning rate schedule

Cross-posting this question from pseeth's repo because in your example you do use a one-cycle LR schedule. Has there been any research on how this strategy interacts with a learning rate schedule? …

Permafacture updated 1 day ago
2

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for policy-learning

1000+ results
for policy-learning