rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

modelscope/ms-swift #1617

SWIFT 2.4 TO DO LIST

# Dataset 1. Refactor the self cognition dataset to support multi-lingual QAs. # Megatron PreTrain 1. Support more Megatron models 2. Support dataset split # Fine-tuning 1. RAG LLM training …

tastelikefeet updated 2 weeks ago
10
diffgram/diffgram #1284

RLHF ChatGPT Tutorial (LLM annotation)

## New items - [ ] Tutorial of using Diffgram (workflow) to train custom LLM with 3rd party training tool (or open source) ## Past context details in internal slack discussion - creating ticket a…

anthony-chaudhary updated 1 year ago
1
neonbjb/tortoise-tts #803

get no benefit from the CLVP module

Hi, all I see no benefit from the CLVP module, the best score AR generated mel code may not so good, even with some timbre mixture, Should we put the speaker cond into the text tokens during tr…

JohnHerry updated 2 months ago
2
nebuly-ai/optimate #299

[ChatLLaMA] RLHF Training: Prompt too long

I am getting the following error when doing RLHF training. I decreased the max_sequence_length in my actor configuration to 1024 because there were errors with training for me when set to 2048. Is my …

swang99 updated 1 year ago
8
RLHFlow/RLHF-Reward-Modeling #20

How do you implement SLic on pair_pm model?

Hi, thanks for uploading the code for pair_pm! Since in the blog, it seems that you are using SLiC for pair_pm models. In the directory of pair_pm, I can't find the code for using slic methods.

t-sifanwu updated 3 months ago
1
l294265421/alpaca-rlhf #11

element 0 of tensors does not require grad and does not have…

我的运行脚本如下： CUDA_VISIBLE_DEVICES=0,1,2,3 deepspeed /data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py --data_path /data/bill.bi/RLHFDataset --data_output_path /…

Bill-Orz updated 1 year ago
5
open-webui/pipelines #222

How to Retrieve User Feedback (Thumbs Up/Down) from Open Web…

I'm trying to figure out how to retrieve user feedback submitted via the thumbs up/thumbs down interface in the Web UI. Specifically, I need to know how to access this feedback data through the pipeli…

latent-variable updated 1 month ago
3
TideDra/VL-RLHF #6

微调LLaVA报错

``` [rank1]: Traceback (most recent call last): [rank1]: File "/home/nfs04/chengkz/VL-RLHF/src/vlrlhf/dpo.py", line 146, in [rank1]: dpo_trainer.train(resume_from_checkpoint=training_args.re…

njucckevin updated 2 months ago
4
smritae01/CS640-Originality-Score-Project #4

tune hyperparameters for RLHF model

Increase the training iterations: Train the PPO model for more iterations, as the model might not have converged yet. Adjust the PPO hyperparameters: Experiment with different hyperparameters such …

GrantorShadow updated 1 year ago
2
IBM/SALMON #5

In which Training step do you use HH-RLHF and SHP datasets?

In which Training step do you use HH-RLHF and SHP datasets? Thanks for your help.

richhh520 updated 2 months ago
1

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for rlhf

1000+ results
for rlhf