OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
https://optimalscale.github.io/LMFlow/
Apache License 2.0
8.11k stars 819 forks source link

[Feature] reward model inferencer and dpov2 aligner #867

Open wheresmyhair opened 4 days ago

wheresmyhair commented 4 days ago

Description

  1. Add reward model inferencer
  2. Add DPOv2 aligner
  3. rm and dpo dataset format change

Tests

This feature

See the latest comment

Other features may be affected

See the latest comment

wheresmyhair commented 2 days ago

src/lmflow/datasets/dataset.py

  • [Document] line 44-45: documents are needed for new types of datasets.
  • [Architecture] line 422: better provide an extra option format=json to specify the format, while using json as default.

src/lmflow/models/hf_decoder_model.py

[Style] line 410: A typo come from history, use_accelerator -> use_accelerate.

src/lmflow/models/hf_model_mixin.py

[Style] line 306: adding "please" will let user feel better, "To prepare the model for training, please set do_train=True.". [Style] line 308: I think it is not a warning here, logger.info will be better. [Style] line 325: It would be nice if comments can be add here to explain the meaning of the statement line 325-328.

src/lmflow/models/hf_model_mixin.py

  • [BUG] old line 375: load_in_8bit = model_args.use_int8 is missing.
  • [Question] line 405: peft model has certain incompatibility issue when used together with RAM optimized load. I am wondering if this issue is verified to be no longer exist in this transformer version? Test results may be required.

src/lmflow/models/hf_text_regression_model.py

  • [Question] line 149: why model_args.use_lora leads to truncation of the sequence? This may be a problem left by history and needs to be recorded.
  • [Style] line 169-173, 177-180: better use tokenize_fn_kwargs = { ... } styled initialization.

src/lmflow/pipeline/dpov2_aligner.py

  • [Style] line 102-119: can rename aligner_args to args right after line 100 to avoid long statements.
  • [Style] line 197: add error message

src/lmflow/pipeline/rm_inferencer.py

  • [Feature] line 77-78: add arguments for those two parameters. Port conflicts often occur and causes failures, adding arguments may help user avoid that.

src/lmflow/pipeline/utils/dpov2_trainer.py

  • [Architecutre] line 25: name of the class PreferenceTrainer shall be the same as the name of the file.

src/lmflow/utils/constants.py

  • [Architecture] line 23-37: The dataset format is not very intuitive, should be
{
    "type": "text_with_score",
    "instances": [
        { "text": "TEXT_1", score: "1.0" },
        { "text": "TEXT_2", score: "0.5" },
        ...
     ],
}
  • [Architecture] line 214: The name can be "text2textlist". Grouped text2text is more like [ {input1, output1}, {input2, output2}, ...].
  • [Question] line 326: why is paired_text2text necessary? Can it be merged into paired_conversation?
wheresmyhair commented 2 days ago

Tests

This PR

rm inference

image image

dpo v2

image

Other features may be affected

vllm inference and memory safe vllm inference

image image image

finetune

image

finetune lora

image