proximal-policy-optimization Search Results

173 results
for proximal-policy-optimization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

run-llama/llama_index #12143

[Question]: benchmark for the llama_index, but the latency i…

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question hello, i want to profile the llama index system . my code snippet is below. My gpu is on…

lambda7xx updated 8 months ago
8
huggingface/trl #1013

Error : cannot inherit non-frozen dataclass from a frozen on…

I try to fine-tune Llama 2 and when I launch the training with : ``` trainer = SFTTrainer( model=model, train_dataset=dataset, peft_config=peft_config, dataset_text_field="text…

EtienneFerrandi updated 11 months ago
8
PKU-Alignment/omnisafe #285

How to introduce our environment?

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/omnisafe/issues) and [Discussions](https://github.com/PKU-A…

charleswangyanan updated 7 months ago
59
huggingface/blog #1292

Errata on "Illustrating Reinforcement Learning from Human Fe…

https://huggingface.co/blog/rlhf ### Background In the section on the third step of the process, it is written: - What multiple organizations seem to have gotten to work is **fine-tuning some…

Voyz updated 1 year ago
4
Tribler/tribler #5313

literature survey + master thesis: G-Rank learn-to-rank

Direction changed, txt will be updated soon. Old stuff: - 1997: [The Internet: A Future Tragedy of the Commons?](https://link.springer.com/chapter/10.1007/978-1-4757-2644-2_22) - [Internet Securi…

synctext updated 3 months ago
183
meta-llama/llama #629

What is llama2 trained on

Hello Does llama2 provide a list of sources used for training the model.if so, where is that made available.. Is the complete code and training sources available in this github repo? Thanks

annapras updated 1 year ago
3
PKU-Alignment/safe-rlhf #109

[Question] 关于reward model 与reward critic model

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…

zhaobinNF updated 1 year ago
4
huggingface/trl #648

PPOTrainer breaks when gradient_accumulation_steps > 1

PPOTrainer throws the following error when passed argument --gradient_accumulation_steps >=2. ``` $ python trl/examples/scripts/sentiment_tuning.py --gradient_accumulation_steps 2 [2023-08-15 20:…

kushalarora updated 1 year ago
3
masak/alma #302

Relationship of P6 and/or 007 macros to FEXPRs

I'm curious to read an explanation of fundamental technical differences or similarities between Perl 6 macros (and, separately if appropriate, 007 macros) with Lisp FEXPRs. I thought ven especially mi…

raiph updated 3 months ago
117
The-Data-Alchemists-Manipal/MindWave #449

Reinforcement Learning for Autonomous Robot Navigation

## 💥 Proposal The goal of this project is to develop an autonomous robot navigation system using reinforcement learning. The robot will learn to navigate and explore its environment efficiently wit…

ayush-09 updated 1 year ago
1

上一页 1...7 8 9 10 11 12 13...18 下一页

173 results for proximal-policy-optimization

173 results
for proximal-policy-optimization