-
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
https://arxiv.org/abs/2405.17220
-
https://github.com/RLHF-V/RLAIF-V/blob/main/muffin/data/data_processors.py#L97
The function is not loaded or defined.
Also, gather_data_files_by_glob function may not match the parquet format of o…
-
**This bug has re-appeared in the latest ms-swift version.**
This bug was initially reported in [this issue](https://github.com/modelscope/ms-swift/issues/1734), and was solved promptly. Now, with th…
-
非常感谢您的开源,有问题想请教:
![image](https://github.com/RLHF-V/RLAIF-V/assets/30074778/e27abcdd-26a0-4938-9647-cf4f3dd53613)
请问一下ref_win_logp这些是标注里面存的预先算出来的吗?RLAIF-V-Dataset里面貌似没有看到呢,有直接可用的数据可以参考吗?感谢
-
**Describe the bug**
Getting the following error only by changing the model to `glm4v-9b-chat` from `llava1_6-mistral-7b-instruct` in the first DPO example [here](https://github.com/modelscope/ms-swi…
-
I want to know how you prove the AI-generated preference annotations are correct so that can be used to train the reward model.
-
按自定义数据格式,训练DPO在Map时报错
File "ms-swift/swift/trainers/dpo_trainer.py", line 114, in tokenize_row
if len(answer_tokens['prompt_input_ids']) + longer_response_length > self.max_length:
KeyError: '…
-
Hi. I am using exactly the same code as yours in run_sft.sh:
```
#!/bin/bash
CUR_DIR=`pwd`
ROOT=${CUR_DIR}
export PYTHONPATH=${ROOT}:${PYTHONPATH}
VISION_MODEL=openai/clip-vit-large-pa…
-
### System Info
transformers version: 4.35.2
Platform: Linux-5.15.0-1050-aws-x86_64-with-glibc2.31
Python version: 3.10.12
Huggingface_hub version: 0.20.2
Safetensors versio…
-
### Description
Figure 1 badly rendered.
### (Optional:) Please add any files, screenshots, or other information here.
_No response_
### (Required) What is this issue most closely related to? Sele…