-
按自定义数据格式,训练DPO在Map时报错
File "ms-swift/swift/trainers/dpo_trainer.py", line 114, in tokenize_row
if len(answer_tokens['prompt_input_ids']) + longer_response_length > self.max_length:
KeyError: '…
-
Hello,
Will Llama-Factory support any RLAIF methods currently? If so can any one share any example/reference implementation for the same.
-
Hi I am getting this error loading the DPO dataset, does anyone know how to resolve it? Thank you!
I have this error even when my pandas version is 2.2.2
> >>> pd.read_parquet("code/eagle-dev/R…
-
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
https://arxiv.org/abs/2405.17220
-
Will TRL support training the LLM's using RLAIF methods? If so can anyone share any reference implementations or examples regrading the same. Thank you.
-
I did a DPO fine-tuning using the default MP command provided [here](https://github.com/modelscope/ms-swift/blob/main/docs/source_en/Multi-Modal/human-preference-alignment-training-documentation.md#dp…
-
It looks like llama.cpp now [supports openbmb/MiniCPM-Llama3-V-2_5.](https://github.com/ggerganov/llama.cpp/pull/7599)
Here's the [official gguf.](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_…
-
https://github.com/RLHF-V/RLAIF-V/blob/main/muffin/data/data_processors.py#L97
The function is not loaded or defined.
Also, gather_data_files_by_glob function may not match the parquet format of o…
-
非常感谢您的开源,有问题想请教:
![image](https://github.com/RLHF-V/RLAIF-V/assets/30074778/e27abcdd-26a0-4938-9647-cf4f3dd53613)
请问一下ref_win_logp这些是标注里面存的预先算出来的吗?RLAIF-V-Dataset里面貌似没有看到呢,有直接可用的数据可以参考吗?感谢
-
I want to know how you prove the AI-generated preference annotations are correct so that can be used to train the reward model.