rlaif Search Results - Githubissues

56 results
for rlaif

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

modelscope/ms-swift #1702

DPO训练报错KeyError: 'prompt_input_ids'

按自定义数据格式，训练DPO在Map时报错 File "ms-swift/swift/trainers/dpo_trainer.py", line 114, in tokenize_row if len(answer_tokens['prompt_input_ids']) + longer_response_length > self.max_length: KeyError: '…

JiaweiZhao-git updated 3 weeks ago
9
hiyouga/LLaMA-Factory #3453

Support for RLAIF methods

Hello, Will Llama-Factory support any RLAIF methods currently? If so can any one share any example/reference implementation for the same.

dineshresearch updated 4 months ago
1
RLHF-V/RLAIF-V #5

Error loading the parquet dataset

Hi I am getting this error loading the DPO dataset, does anyone know how to resolve it? Thank you! I have this error even when my pandas version is 2.2.2 > >>> pd.read_parquet("code/eagle-dev/R…

charismaticchiu updated 2 months ago
3
mengdi-li/awesome-RLAIF #1

Add "RLAIF-V: Aligning MLLMs through Open-Source AI Feedback…

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness https://arxiv.org/abs/2405.17220

dschaehi updated 3 months ago
1
huggingface/trl #1562

Support for training LLM's using RLAIF methods

Will TRL support training the LLM's using RLAIF methods? If so can anyone share any reference implementations or examples regrading the same. Thank you.

dineshresearch updated 3 months ago
2
modelscope/ms-swift #1786

How to load a model from DPO checkpoint

I did a DPO fine-tuning using the default MP command provided [here](https://github.com/modelscope/ms-swift/blob/main/docs/source_en/Multi-Modal/human-preference-alignment-training-documentation.md#dp…

Lopa07 updated 3 weeks ago
2
ollama/ollama #6313

openbmb / MiniCPM-Llama3-V-2_5

It looks like llama.cpp now [supports openbmb/MiniCPM-Llama3-V-2_5.](https://github.com/ggerganov/llama.cpp/pull/7599) Here's the [official gguf.](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_…

chigkim updated 1 month ago
2
RLHF-V/RLAIF-V #3

dpo_preference_processor not defined

https://github.com/RLHF-V/RLAIF-V/blob/main/muffin/data/data_processors.py#L97 The function is not loaded or defined. Also, gather_data_files_by_glob function may not match the parquet format of o…

RifleZhang updated 3 months ago
1
RLHF-V/RLAIF-V #1

ref_win_logp

非常感谢您的开源，有问题想请教： ![image](https://github.com/RLHF-V/RLAIF-V/assets/30074778/e27abcdd-26a0-4938-9647-cf4f3dd53613) 请问一下ref_win_logp这些是标注里面存的预先算出来的吗？RLAIF-V-Dataset里面貌似没有看到呢，有直接可用的数据可以参考吗？感谢

buptlihang updated 3 months ago
2
yonseivnl/vlm-rlaif #2

AI-generated preference annotations may be noisy

I want to know how you prove the AI-generated preference annotations are correct so that can be used to train the reward model.

hlchen23 updated 1 month ago
2

上一页 1...1 2 3 4 5 6...6 下一页

56 results for rlaif

56 results
for rlaif