James-QiuHaoran / LLM-serving-with-proxy-models

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny model can tell you the verbosity of an LLM (with low latency!)
Apache License 2.0
22 stars 5 forks source link

experiment replication #11

Closed hunzhizi closed 3 weeks ago

hunzhizi commented 4 weeks ago

Hello, while replicating the experiment, the model's accuracy is not very satisfactory. Could you please tell me the number of epochs used in the first stage (full-parameter training) and the second stage (fixing the weights of BERT-base)?

James-QiuHaoran commented 3 weeks ago

Number of epochs is set to 6 during training. In the first 3 epochs, the BERT-base weight is updated but frozen after that (only updating the classification layer).

The data_size parameter also matters (see issue here). The example training command uses a small data size of 10 only for demonstration purposes. For full evaluation, please consider set the data size to be the training dataset size.