I've tried to finetune the llm4decompile-6.7b model on my dataset and the result is impressive.
My own dataset looks like the following format
{'instruction': 'MY_CUSTOMIZE_QUESTION, 'input': '', 'output': 'MY_CUSTOMIZE_ANSWER}
and it will be formed like this
{{ bos }}
user: data[idx]['instuction']
{{ eos }}
assistant:
classificiation: data[idx]['output']
{{ eos }}
Everything works totally fine and the evaluation results is satisfied.
However, everything goes wrong when I try to fine-tune the 9B model.
I change the part of my code that loads the model from ‘llm4decompile-6.7b’ to ‘llm4decompile-9b’ while keeping everything else the same.
The model prediction becomes empty after a few steps update and the loss become nan due to the empty output.
The first step of model predictions:
Decoded Predictions: ['" on the provided the followingE"s" section... "]
The few steps of model predictions:
Decoded Predictions: ['', '', '', '']
This question is really bothering me, and I hope someone can give me some advice. Any advice would be greatly appreciated.
The 9B model is based on Yi-Coder, while the training script is from Deepseek-Coder. We did not test the 9B model for the script, we recommend to use llama factory to tune the 9B model.
I've tried to finetune the llm4decompile-6.7b model on my dataset and the result is impressive. My own dataset looks like the following format
{'instruction': 'MY_CUSTOMIZE_QUESTION, 'input': '', 'output': 'MY_CUSTOMIZE_ANSWER}
and it will be formed like this
Everything works totally fine and the evaluation results is satisfied.
However, everything goes wrong when I try to fine-tune the 9B model. I change the part of my code that loads the model from ‘llm4decompile-6.7b’ to ‘llm4decompile-9b’ while keeping everything else the same.
The model prediction becomes empty after a few steps update and the loss become nan due to the empty output.
This question is really bothering me, and I hope someone can give me some advice. Any advice would be greatly appreciated.