wheresmyhair commented 4 days ago

Description

Add reward model inferencer
Add DPOv2 aligner
rm and dpo dataset format change

Tests

This feature

See the latest comment

Other features may be affected

See the latest comment

wheresmyhair commented 2 days ago

`src/lmflow/datasets/dataset.py`

[Document] line 44-45: documents are needed for new types of datasets.

[Architecture] line 422: better provide an extra option format=json to specify the format, while using json as default.

✅

`src/lmflow/models/hf_decoder_model.py`

[Style] line 410: A typo come from history, use_accelerator -> use_accelerate.

⏳ Will change this in the future PR (PR for a comprehensive Accelerate support), since changing this needs change in arguments, and will affect lots of existing modules.

`src/lmflow/models/hf_model_mixin.py`

[Style] line 306: adding "please" will let user feel better, "To prepare the model for training, please set do_train=True.". [Style] line 308: I think it is not a warning here, logger.info will be better. [Style] line 325: It would be nice if comments can be add here to explain the meaning of the statement line 325-328.

✅

`src/lmflow/models/hf_model_mixin.py`

[BUG] old line 375: load_in_8bit = model_args.use_int8 is missing.

✅ Not a bug, load_in_8bit is now included in bnb config and set in __prepare_quant_config (the else clause).

[Question] line 405: peft model has certain incompatibility issue when used together with RAM optimized load. I am wondering if this issue is verified to be no longer exist in this transformer version? Test results may be required.

✅ No longer exists, test result: aea5a5ae55c9c198197576de4948544

Thus, we remove

if model_args.lora_model_path is not None:
logger.warning(
    "LoRA does not support RAM optimized load currently. Using original load."
)
model_args.use_ram_optimized_load = False

`src/lmflow/models/hf_text_regression_model.py`

[Question] line 149: why model_args.use_lora leads to truncation of the sequence? This may be a problem left by history and needs to be recorded.

⏳ Add this to roadmap, and we will keep this unchanged in this PR.

[Style] line 169-173, 177-180: better use tokenize_fn_kwargs = { ... } styled initialization.

✅

`src/lmflow/pipeline/dpov2_aligner.py`

[Style] line 102-119: can rename aligner_args to args right after line 100 to avoid long statements.

[Style] line 197: add error message

✅

`src/lmflow/pipeline/rm_inferencer.py`

[Feature] line 77-78: add arguments for those two parameters. Port conflicts often occur and causes failures, adding arguments may help user avoid that.

⏳ Removed, and user could specify in shell script if using deepspeed, or specify in yaml if using accelerate. We will have an overhaul on ports, addresses, and all other distributed settings in the next Accelerate support PR, hopefully making everything tidy and clear.

`src/lmflow/pipeline/utils/dpov2_trainer.py`

[Architecutre] line 25: name of the class PreferenceTrainer shall be the same as the name of the file.

✅

`src/lmflow/utils/constants.py`

[Architecture] line 23-37: The dataset format is not very intuitive, should be
{
    "type": "text_with_score",
    "instances": [
        { "text": "TEXT_1", score: "1.0" },
        { "text": "TEXT_2", score: "0.5" },
        ...
     ],
}
[Architecture] line 214: The name can be "text2textlist". Grouped text2text is more like [ {input1, output1}, {input2, output2}, ...].

[Question] line 326: why is paired_text2text necessary? Can it be merged into paired_conversation?

✅ Fixed (as what dev team have discussed in the zoom session)

OptimalScale / LMFlow

[Feature] reward model inferencer and dpov2 aligner #867

Description

Tests

This feature

Other features may be affected

`src/lmflow/datasets/dataset.py`

`src/lmflow/models/hf_decoder_model.py`

`src/lmflow/models/hf_model_mixin.py`

`src/lmflow/models/hf_model_mixin.py`

`src/lmflow/models/hf_text_regression_model.py`

`src/lmflow/pipeline/dpov2_aligner.py`

`src/lmflow/pipeline/rm_inferencer.py`

`src/lmflow/pipeline/utils/dpov2_trainer.py`

`src/lmflow/utils/constants.py`

Tests

This PR

rm inference

dpo v2

Other features may be affected

vllm inference and memory safe vllm inference

finetune

finetune lora