rlaif-v Search Results - Githubissues

35 results
for rlaif-v

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

PKU-Alignment/align-anything #40

[Question] LLaVA DPO training loss increases

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/align-anything/issues) and [Discussions](https://github.com…

fangqi-Zhu updated 3 days ago
4
unslothai/unsloth #320

Lora downcasting issue

When creating a PEFT model and then trying to train it, we get an error; ``` File "/scratch/gpfs/ashwinee/unsloth/unsloth/kernels/fast_lora.py", line 106, in backward d_do…

kiddyboots216 updated 1 month ago
18
ollama/ollama #6313

openbmb / MiniCPM-Llama3-V-2_5

It looks like llama.cpp now [supports openbmb/MiniCPM-Llama3-V-2_5.](https://github.com/ggerganov/llama.cpp/pull/7599) Here's the [official gguf.](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_…

chigkim updated 1 month ago
2
modelscope/ms-swift #1702

DPO训练报错KeyError: 'prompt_input_ids'

按自定义数据格式，训练DPO在Map时报错 File "ms-swift/swift/trainers/dpo_trainer.py", line 114, in tokenize_row if len(answer_tokens['prompt_input_ids']) + longer_response_length > self.max_length: KeyError: '…

JiaweiZhao-git updated 3 weeks ago
9
RLHF-V/RLAIF-V #5

Error loading the parquet dataset

Hi I am getting this error loading the DPO dataset, does anyone know how to resolve it? Thank you! I have this error even when my pandas version is 2.2.2 > >>> pd.read_parquet("code/eagle-dev/R…

charismaticchiu updated 3 months ago
3
mengdi-li/awesome-RLAIF #1

Add "RLAIF-V: Aligning MLLMs through Open-Source AI Feedback…

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness https://arxiv.org/abs/2405.17220

dschaehi updated 3 months ago
1
RLHF-V/RLAIF-V #3

dpo_preference_processor not defined

https://github.com/RLHF-V/RLAIF-V/blob/main/muffin/data/data_processors.py#L97 The function is not loaded or defined. Also, gather_data_files_by_glob function may not match the parquet format of o…

RifleZhang updated 3 months ago
1
modelscope/ms-swift #1786

How to load a model from DPO checkpoint

I did a DPO fine-tuning using the default MP command provided [here](https://github.com/modelscope/ms-swift/blob/main/docs/source_en/Multi-Modal/human-preference-alignment-training-documentation.md#dp…

Lopa07 updated 3 weeks ago
2
RLHF-V/RLAIF-V #1

ref_win_logp

非常感谢您的开源，有问题想请教： ![image](https://github.com/RLHF-V/RLAIF-V/assets/30074778/e27abcdd-26a0-4938-9647-cf4f3dd53613) 请问一下ref_win_logp这些是标注里面存的预先算出来的吗？RLAIF-V-Dataset里面貌似没有看到呢，有直接可用的数据可以参考吗？感谢

buptlihang updated 3 months ago
2
RLHF-V/RLAIF-V #6

Self feedback data generation pipeline & reference model

Hi 2 quick questions, 1. From the paper algorithm1, I get a sense that the algorithm can work in an online divide-n-conquer manner with updated model and I am just curious when the self-feedback co…

charismaticchiu updated 2 months ago
7

上一页 1...1 2 3 4...4 下一页

35 results for rlaif-v

35 results
for rlaif-v