Try ChatGPT fine tuning functionality

vrodriguezf commented 10 months ago

One way to fine tune would be to convert the logs of human gameplays to the json format required by the finetune api:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}

The idea would be to turn human actions into "role": "assistant" messages as if they were the result of chatting with the model. About the number of gameplays, the docs say:

To fine-tune a model, you are required to provide at least 10 examples. We typically see clear improvements from fine-tuning on 50 to 100 training examples with gpt-3.5-turbo but the right number varies greatly based on the exact use case. We recommend starting with 50 well-crafted demonstrations and seeing if the model shows signs of improvement after fine-tuning.

To test for generalisation, since there are 3 scenarios with different initial conditions, we should leave 1 for test (the fine-tuning GUI allows you to upload separate jsons for training and test. Actually, a better strategy would be to create more scenarios on our own by replicating the mission files that we already have for the provided missions

OhhTuRnz commented 10 months ago

About fine-tuning i think we can divide this activity in two parts:

One would be mocking prompts with human agent data
The other would be faking the history of the prompting with human agent data as seen in the FastAPI tutorial you shared in the resources

vrodriguezf commented 10 months ago

exactly, the idea would to be to turn human gameplays into fake chatgpt conversations in which the human responses are mapped to assistant responses. Then, use that to fine tune the model.

OhhTuRnz commented 10 months ago

exactly, the idea would to be to turn human gameplays into fake chatgpt conversations in which the human responses are mapped to assistant responses. Then, use that to fine tune the model.

Regarding this, i coded a script that converts csv into a "GPT readable" json file. I'm attaching one example here.

pe1_i3_keyboard_agent_actions_20231016-151738.json

OhhTuRnz commented 10 months ago

I don't know how the prompt would be intended, and i ask when running the script which environment do you want for better undersandability.

OhhTuRnz commented 9 months ago

FYI My fine tuned model is this: Ft:GPT-3.5-turbo-1106:personal::8MDSBDRp

vrodriguezf commented 9 months ago

closing this, already tried and worked

ARCLab-MIT / kspdg

Try ChatGPT fine tuning functionality #15