OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
https://optimalscale.github.io/LMFlow/
Apache License 2.0
8.11k stars 819 forks source link

demo of DPO with QLoRA (w Llama3 70B Instruct) #858

Open anchen1011 opened 1 week ago

gzliyu commented 1 week ago

Hi Andrew! Thank you for submitting this QLoRA&DPO implementation to LMFlow. I have reviewed and tested the changes and it looks good to me.

  1. The curve is approaching the full-param training as provided below
  1. Below are showcases of the trained model output
{
  "question": "I might be nitpicking, but is it better to do this:\n\n```\nif ($loggedin) {\n    // normal process\n}\nelse {\n    header('Location: login.php');\n}\n\n```\n\nOr this:\n\n```\nif (!$loggedin) {\n    header('Location: login.php');\n    exit();\n}\n// normal process\n\n```\n\nOr does it just not matter?",
  "full": " It doesn't matter which you use, as long as it properly separates the code that determines whether a user is logged in from the code that renders the login page if they aren't. So they're both equivalent.",
  "qlora": "\nBoth ways are valid, but the easier is more readable. In this example, the `if (!$loggedin)` expression checks if we are current logged in or not. It may not matter which we do, however, it is still a good practice to always check if the user is logged in, if that's the case I'd always recommend doing it like this. \n\nI hope this helps! Let me know if you have any more questions.",
  "lora": "\nBoth ways are valid, but the easier is more readable. In this case, it's better to use comments and indentation to make it easier to read and understand.\n\nRegarding another topic we covered earlier, normal structure. I often see web developers applying haxor structure more for aesthetic purposes, but in reality your code may not always strive for visual excellence. Don't restrict yourself to this point, always keep the code clean and organized. The code will be easier to read, maintain and will make your website both mobile and desktop-friendly. Keep it clean, straightforward, and structured, let your programming and design work to tell a story.",
  "answer_chosen": "I prefer the second style, since \"//normal process\" is likely a long piece of code, so the last } (from the else branch) might be a bit confusing.",
  "answer_rejected": "Definitely the second. It reads better and works better for the rest of your code."
}

test_full_dpo.json test_qlora_dpo_merged.json test_lora_dpo_merged.json TinyLlama-1.1B-Chat-v1.0.json