albertan017 / LLM4Decompile

Reverse Engineering: Decompiling Binary Code with Large Language Models
https://arxiv.org/abs/2403.05286
MIT License
3.2k stars 233 forks source link

Prediction becomes empty, therefore the loss become nan. #34

Open zero90169 opened 3 days ago

zero90169 commented 3 days ago

I've tried to finetune the llm4decompile-6.7b model on my dataset and the result is impressive. My own dataset looks like the following format {'instruction': 'MY_CUSTOMIZE_QUESTION, 'input': '', 'output': 'MY_CUSTOMIZE_ANSWER}

and it will be formed like this

{{ bos }}
user: data[idx]['instuction']
{{ eos }}
assistant:
classificiation: data[idx]['output']
{{ eos }}

Everything works totally fine and the evaluation results is satisfied.

However, everything goes wrong when I try to fine-tune the 9B model. I change the part of my code that loads the model from ‘llm4decompile-6.7b’ to ‘llm4decompile-9b’ while keeping everything else the same.

The model prediction becomes empty after a few steps update and the loss become nan due to the empty output.

The first step of model predictions:
Decoded Predictions: ['" on the  provided the followingE"s" section... "]

The few steps of model predictions:
Decoded Predictions: ['', '', '', '']

This question is really bothering me, and I hope someone can give me some advice. Any advice would be greatly appreciated.

package version:
accelerate==1.0.1
bitsandbytes==0.42.0
deepspeed==0.15.2
datasets==2.17.0
evaluate==0.4.1
gpustat==1.1.1
huggingface-hub==0.23.2
hydra-core==1.3.2
icecream==2.1.3
Jinja2==3.1.2
jsonlines==4.0.0
langchain==0.1.0
langchain-core==0.1.8
loguru==0.7.2
mlflow==2.9.0
openai==1.40.0
packaging>=23.2
pandas==2.0.3
peft==0.11.0
pendulum==2.1.2
pyarrow==14.0.0
pysnooper==1.2.0
PyYAML==6.0
retrying==1.3.4
scikit-learn==1.3.2
seaborn==0.13.0
tokenizers==0.20.3
torch==2.4.0
torchvision==0.19.0
vllm==0.6.3.post1
transformers==4.45.2
trl==0.11.0
wandb==0.16.0
flash-attn==2.6.2
albertan017 commented 9 hours ago

The 9B model is based on Yi-Coder, while the training script is from Deepseek-Coder. We did not test the 9B model for the script, we recommend to use llama factory to tune the 9B model.