proximal-policy-optimization Search Results

161 results
for proximal-policy-optimization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

lllyasviel/Paints-UNDO #41

Mega Thread for Non-Technical Discussions

Hi Users, We have been informed that GitHub has recently deleted several comments and posts due to violations of their [Hate Speech and Discrimination Policy](https://docs.github.com/en/site-policy…

lllyasviel updated 2 days ago
91
PKU-Alignment/omnisafe #320

about P3O algorithms

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/omnisafe/issues) and [Discussions](https://github.com/PKU-A…

Eureka725 updated 3 months ago
1
NeuralNetworkVerification/Marabou #802

Issue with `network.solve()` method in Marabou

Dear Marabou Developers, I am currently using the Marabou library for a project and I have encountered an issue with the `network.solve()` method. When I call this method, it raises a `ValueError` …

Vafali updated 2 months ago
13
fly51fly/aicoco #5

爱可可老师一周论文精选

fly51fly updated 2 months ago
106
huggingface-cn/deep-rl-class-zh-CN #22

Glossary - 汇总

本 issue 专门汇总本教程各个单元的中文版本的术语与相关注意词汇。请各位译者自行汇总重要概念与相关词汇在本 issue 下面 # 注意 > 汇总不单单包括英文原本术语，还有中文部分不好翻译的相关术语，相关内容补充资料等 # 格式： ## 第一单元：XXXX @translators ### [术语]： - 马尔可夫性质: - 这意味着我们的智能体采取的行动*…

innovation64 updated 1 year ago
9
JavaStudentAlex/RL #9

Make a reading assignment

JavaStudentAlex updated 6 months ago
2
ggerganov/whisper.cpp #1853

After running for a period of time, repeatedly output the sa…

I use model large-v3 When After running for a period of time, repeatedly output the same sentence like this: 00:00:00->00:00:29：请不吝点赞订阅转发打赏支持明镜与点点栏目 00:00:29->00:00:59：请不吝点赞订阅转发打赏支持明镜与点点栏目 0…

dfengpo updated 5 days ago
19
noahgsolomon/brainrot.js #16

error transcribing audio

this is the terminial log: ![image](https://github.com/yamauz/live-gpt/assets/134033385/e8fcd7bd-1690-4ebb-b0a9-1932329e322e) Error transcribing audio (attempt 2): Error: HTTP error! sta…

godzilla214 updated 2 months ago
4
huggingface/trl #905

Adding P3O trainer

P3O (Pairwise Policy Optimization) is a recent paper from Berkeley: It introduces a new way to align LLMs to human preferences. The loss function is particularly cool as it directly operates on co…

gaetanlop updated 7 months ago
5
huggingface/trl #1112

Reference model alignment with the current policy

Hello, I've been exploring the implementation of Proximal Policy Optimization (PPO) in the [ppo_trainer.py](https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py) file, and I have…

sajastu updated 6 months ago
4

上一页 1...3 4 5 6 7 8 9...17 下一页

161 results for proximal-policy-optimization

161 results
for proximal-policy-optimization