SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.96k stars 412 forks source link

Would you please kindly offer the data, codes, or settings for training the predictor? #124

Closed Raincleared-Song closed 10 months ago

Raincleared-Song commented 10 months ago

Prerequisites

Before submitting your question, please ensure the following:

Question Details

I'm trying to train the sparsity predictor by referring to DejaVu, but I have a strange finding. I generate the predictor training data on C4 by myself. For ReLULLaMA-7B, I train a predictor with higher recalls on C4 than that you provide in ReLULLaMA-7B-Predictor. (e.g., 0.94 v.s. 0.90 in layer 0)

However, when applying this predictor to PowerInfer, the efficiency is considerably lower than your ReLULLaMA-7B-Predictor. What is wrong with this? (The upper image is obtained by the predictor trained by myself. The lower image is obtained by ReLULLaMA-7B-Predictor.)

image image

It is also probably due to some mistakes I made. Therefore, I also attach the code for generating the training data (get_llama_data.py and hf_llama_module.py), and for training the predictor (main_mlp.py, run_c4_mlp.sh, trainer_mlp.py).

Looking forward to your response! Of course, the best solution is to open-source the data, codes, or just parameter settings for training ReLULLaMA-7B-Predictor.

YixinSong-e commented 10 months ago

Thank you for your interest in our work. The open-source related predictor training code and settings are indeed in our plan. At present, we are doing more things that we consider to be of higher priority, such as optimizing cuda operators and refactoring code, as well as supporting mistral and mixtral models. I think the code you provided gave me a great inspiration for open-source training code for predictors. Now let me reply with some details about predictor training. Firstly, during the training process of the predictor, the hidden layer dimension is adaptive and not fixed at 1000. Secondly, the recall rate of PowerInfer's predictor is very high on OPT and Falcon models. Currently, due to the low sparsity of relullama itself, when obtaining activated neurons, we consider the top k%(such as 15%) neurons as activated neurons based on the L2-norm value output by the neurons. This means that the predictor does not need to be set very large. We have also been conducting some interesting experiments recently to further push more sparsity of Swiglu-based LLM in converting to reglu models. Anyway, we are organizing the code related to the predictor and hope to provide easy-to-use tools for open-source development in the future.