VinAIResearch / PhoGPT

PhoGPT: Generative Pre-training for Vietnamese (2023)
Apache License 2.0
739 stars 67 forks source link

What is the data format to perform fine turning of the phoGPT model? #10

Closed dangyuuki123 closed 9 months ago

dangyuuki123 commented 9 months ago

I want to perform fine turning on the phoGPT model with the goal of answering the information in the text source. What format should the data have to be able to do this?

datquocnguyen commented 9 months ago

See: https://github.com/mosaicml/llm-foundry/blob/main/scripts/train/README.md#llmfinetuning

Here is the formatted example I used for fine-tuning the base PhoGPT with context-based QA.

formatted_example = {'prompt': "### Câu hỏi:\nDựa vào văn bản sau đây:\n{text}\nHãy trả lời câu hỏi: {question}\n\n### Trả lời:" , 'response': "{response_text}"}