guided-policy-search Search Results

416 results
for guided-policy-search

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

gravitational/teleport #48004

Teleport 17 Web Test Plan

## Web UI ## Main For main, test with a role that has access to all resources. As you go through testing, click on any links you come across to make sure they work (no 404) and are up to date. ###…

r0mant updated 3 days ago
10
milonmaze/privacy-terms-observatory #4

www.facebook.com

Tracking updates of www.facebook.com

milonmaze updated 2 days ago
155
milonmaze/privacy-terms-observatory-beta #48

help.instagram.com

Tracking updates of help.instagram.com

milonmaze updated 1 month ago
280
Thinking-with-Deep-Learning-Spring-2022/Readings-Responses #14

Week 8 - Possible Readings

Post a link for a "possibility" reading of your own on the topic of Reinforcement Learning [for week 8], accompanied by a 300-400 word reflection that: 1) briefly summarizes the article (e.g., as we d…

lkcao updated 2 years ago
21
leela-zero/leela-zero #696

Tune first play urgency ?

I tried a lot of modifications to first play urgency after learning that AlphagoZero probably the equivalent of 0.5, reading #238 and the relevant part of the AlphagoZero paper. A lot of my modificati…

remdu updated 6 years ago
369
leela-zero/leela-zero #1732

Training w/o full game, is it possible?

Considering the AG paper is mainly themed on "MCTS as a policy improvement operator". In that sense, is it possible to do training w/o full games? AKA, just take any board position, and train the …

merlinpan updated 6 years ago
6
leela-zero/leela-zero #860

MCTS alternatives

Since a lot of people are working on tuning FPU at the moment and some people are exploring tweaks to the search algorithm I wanted to share a few areas of research I was looking over this evening, in…

roy7 updated 5 years ago
136
OpenRLHF/OpenRLHF #499

[RFC] Modularizing Sample Generation with Rating in PPO for …

## TL; DR This RFC proposes separating sample generation and reward model scoring from the original rollout process in PPO, enabling users to more flexibly customize sample generation and create sa…

zhuzilin updated 1 week ago
6
milonmaze/privacy-terms-observatory #19

help.instagram.com

Tracking updates of help.instagram.com

milonmaze updated 1 month ago
122
langgenius/dify #7477

Assistant Response Prefill

### Self Checks - [X] I have searched for existing issues [search for existing issues](https://github.com/langgenius/dify/issues), including closed ones. - [X] I confirm that I am using English to su…

sfyumi updated 1 month ago
1

上一页 1...1 2 3 4 5 6 7...42 下一页

416 results for guided-policy-search

416 results
for guided-policy-search