larryyin commented 11 months ago

Added draft example code to train with mlflow. It encountered some GPU issue. We can discuss it when we meet, maybe on Friday.

This is not the final version, just a point for review.

larryyin commented 11 months ago

All corrected. But please hold on and don't approve and merge until we meet. There are some issues. We can discuss them in person.

larryyin commented 11 months ago

This is the final version. All the added scripts have been tested.

larryyin commented 11 months ago

They are kept for demo.On Oct 7, 2023 4:50 PM, goldmermaid @.***> wrote: @goldmermaid commented on this pull request.

In .gitignore:

@@ -172,13 +172,21 @@ cython_debug/

data

*.csv +!example/rlhf/mlflow/input_rw/ranking.csv +!example/rlhf/ranking.csv

Maybe this two csv can be kept for future users?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

larryyin commented 11 months ago

Or we can remove this step 3 rl notebook example? For step 3 we only use the py file as example.In mlflow folder, I removed this notebook.On Oct 7, 2023 4:54 PM, goldmermaid @.***> wrote: @goldmermaid commented on this pull request.

In example/rlhf/demo_rl.ipynb:

  "output_type": "error",
"traceback": [

"\u001b[1;31mThe Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click here for more info. View Jupyter log for further details."

"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",

"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",

"\u001b[1;32m/home/ubuntu/pykoi/example/rlhf/demo_rl.ipynb Cell 7\u001b[0m line \u001b[0;36m1\n\u001b[1;32m 1\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39maccelerate\u001b[39;00m \u001b[39mimport\u001b[39;00m notebook_launcher\n\u001b[1;32m 3\u001b[0m config \u001b[39m=\u001b[39m RLHFConfig(base_model_path\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39melinas/llama-7b-hf-transformers-4.29\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m# \"elinas/llama-7b-hf-transformers-4.29\", \u001b[39;00m\n\u001b[1;32m 4\u001b[0m dataset_type\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mlocal_db\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m 5\u001b[0m reward_model_path\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mgoldmermaid/rlhf_reward_model\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 9\u001b[0m \n\u001b[1;32m 10\u001b[0m )\n\u001b[0;32m---> 11\u001b[0m rlhf_step3_rl \u001b[39m=\u001b[39m RL(config)\n\u001b[1;32m 12\u001b[0m rlhf_step3_rl\u001b[39m.\u001b[39mtrain(\u001b[39m\"\u001b[39m\u001b[39m./models/rlhf_step3_rl\u001b[39m\u001b[39m\"\u001b[39m, num_processes\u001b[39m=\u001b[39m\u001b[39m1\u001b[39m)\n",

"\u001b[1;32m/home/ubuntu/pykoi/example/rlhf/demo_rl.ipynb Cell 7\u001b[0m line \u001b[0;36m9\n\u001b[1;32m 5\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mnum_proc \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mnum_workers \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mstreaming \u001b[39melse\u001b[39;00m \u001b[39mNone\u001b[39;00m\n\u001b[1;32m 6\u001b[0m set_seed(rlhf_config\u001b[39m.\u001b[39mseed) \u001b[39m## TODO: how to set seed properly in init?\u001b[39;00m\n\u001b[1;32m 8\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mppo_config\u001b[39m=\u001b[39mPPOConfig(\n\u001b[0;32m----> 9\u001b[0m steps\u001b[39m=\u001b[39m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_rlhf_config\u001b[39m.\u001b[39;49mtotal_ppo_epochs,\n\u001b[1;32m 10\u001b[0m model_name\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mbase_model_path,\n\u001b[1;32m 11\u001b[0m learning_rate\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mlearning_rate,\n\u001b[1;32m 12\u001b[0m batch_size\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mppo_batch_size,\n\u001b[1;32m 13\u001b[0m mini_batch_size\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mmini_batch_size,\n\u001b[1;32m 14\u001b[0m gradient_accumulation_steps\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mgradient_accumulation_steps,\n\u001b[1;32m 15\u001b[0m optimize_cuda_cache\u001b[39m=\u001b[39m\u001b[39mTrue\u001b[39;00m,\n\u001b[1;32m 16\u001b[0m early_stopping\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mearly_stopping,\n\u001b[1;32m 17\u001b[0m target_kl\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mtarget_kl,\n\u001b[1;32m 18\u001b[0m ppo_epochs\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mppo_epochs,\n\u001b[1;32m 19\u001b[0m seed\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39mseed,\n\u001b[1;32m 20\u001b[0m init_kl_coef\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39minit_kl_coef,\n\u001b[1;32m 21\u001b[0m adap_kl_ctrl\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_rlhf_config\u001b[39m.\u001b[39madap_kl_ctrl,\n\u001b[1;32m 22\u001b[0m \u001b[39m# accelerator_kwargs=self._rlhf_config.accelerator_kwargs,\u001b[39;00m\n\u001b[1;32m 23\u001b[0m )\n\u001b[1;32m 25\u001b[0m \u001b[39m## Load the base model and tokenizer and define the PPO Trainer for RL\u001b[39;00m\n\u001b[1;32m 26\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mbase_tokenizer \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mcreate_tokenizer(rlhf_config\u001b[39m.\u001b[39mbase_model_path)\n",

"\u001b[0;31mAttributeError\u001b[0m: 'RLHFConfig' object has no attribute 'total_ppo_epochs'"

It seems that there is still an error here. Can you clean up the output and write a to do note?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

CambioML / pykoi-rlhf-finetuned-transformers

Added mlflow #74

data