Open Eichhof opened 11 months ago
Please reference this article for understanding why you should not expect to see knowledge gain after fine-tuning: https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts
Specifically you can reference the "Juliet was in love with someone" example in the article.
Thank you for the link. I had a read and it makes sense. Before switching to Llama-2 I was fine-tuning GPT-J. GPT-J was learning also all the knowledge from the training data (same training data as I used for Llama-2. Why is there such a huge difference between Llama-2 and GPT-J in terms of learning new knowledge?
Can you share what benchmark you are using to measure whether knowledge is learned? What were your results between the two models?
Maybe there is some architecture difference between the two models or their implementation?
I'm not using a benchmark. I just have knowledge about certain persons or companies in my training dataset (this knowledge is not available in the vanialla Llama-2 and GPT-J model). I then just tested this knowledge by generating responses (in a conversation) from the models. GPT-J can answer questions about the persons and companies (answers are similar to training data), but Llama-2 does not know them (it provides answers like that it does not know this person or company).
Please check that this issue hasn't been reported before.
Expected Behavior
The model should learn the information/knowledge from the training data. From the example training data below it should learn information about Leon Klein (age etc.).
Current behaviour
When conversing with the chatbot, the chatbot does not know about Leon Klein.
Steps to reproduce
I'm fine-tuning Llama-2 13b with Axolotl. My dataset for fine-tuning looks as follows:
This is just an example of two conversations. My training data consists of around 4200 conversations. Each conversation consists of 20 - 40 turns. The conversations contain facts about people. The facts repeat over multiple conversations (with different wordings of course). The fine-tuned model adapts the data format (i.e., it also outputs turns separated by ###) but it does not remember specific information.
Config yaml
Possible solution
I have also tried with a learning rate of 0.00018 for 5 epochs and a constant learning rate scheduler. And I tried a learning rate of 0.001. It did not solve the issue.
Could the dataset format be the issue? Could alpaca format solve the issue?
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main/575a082
Acknowledgements