albertan017 / LLM4Decompile

Reverse Engineering: Decompiling Binary Code with Large Language Models
https://arxiv.org/abs/2403.05286
MIT License
3.19k stars 233 forks source link

Question about max token size for training. #16

Closed lancasterJie closed 6 months ago

lancasterJie commented 6 months ago

Hi, I have a question regarding the maximum token size. Currently, the maximum token size for the model inference is set to 500. I would like to know if it's possible to share the training parameters, including the value of the maximum token size.

Thanks a lot.

albertan017 commented 6 months ago

2024.5.10 Updata: The v1.5 models support 4K input length, enjoy!

During the training phase, we configured the maximum token size to be 1,024 for this version. We are also preparing to launch an updated version trained with a larger dataset and with a maximum token size of 4,000, which greatly enhances performance compared to the current model.

lancasterJie commented 6 months ago

During the training phase, we configured the maximum token size to be 1,024 for this version. We are also preparing to launch an updated version trained with a larger dataset and with a maximum token size of 4,000, which greatly enhances performance compared to the current model.

Thanks for the reply