The offical realization of InstructERC
111 stars 7 forks source link

The performance of the model I reproduced does not meet the standards outlined in the paper. #14

Open stddddd opened 1 month ago

stddddd commented 1 month ago

I reproduced the Main Result Reproduction on LoRA + InstructERC based on Llama2, and the performance I got did not meet the paper. The table below is the comparision:

reproduce 65.47 66.96 39.16
paper 71.39 69.15 41.37

Compared to the original code, I only made the following modifications:

  1. data_percent: 1/64 -> 1

  2. set LLaMA2 MODELPATH to my model path, the Llama2 version I use is Llama-2-7b-chat-hf

  3. While reproducing the code, I met an issue: RuntimeError: probability tensor contains either inf, nan or element < 0.

    To solve the problem, I added a code to the Llama2 model file:

probs = nn.functional.softmax(next_token_scores, dim=-1)

nans = torch.isnan(probs)
if nans.any(): 
   idx = torch.argwhere(torch.sum(nans, 1))
   z = torch.zeros_like(probs[idx][0])
   z[0][2] = 1.
   probs[idx] = z

next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)

What else should I modify to reach the performance mentioned in the paper?

LIN-SHANG commented 1 month ago

The large performance gap is indeed confusing,belowing is something that may help you:

LLaMA version: or I haven't tried about any version of LLaMA Chat

Besides, I haven't met the RuetimeError you provides before, I can provide related GPU, Nivida Driver and CUDA version:

A100, 470, 11.7

stddddd commented 1 month ago

I reproduced again using your mentioned environment A100, Nvidia Driver 470, and CUDA version 11.7. Besides, I downloaded LLaMA version Llama-2-7b-hf from your produced link:

However, the performance I got still did not meet the paper. The table below is the comparison:

reproduce 67.53 67.46 39.20
paper 71.39 69.15 41.37

Do you have any idea about it?

The large performance gap is indeed confusing,belowing is something that may help you:

LLaMA version: or I haven't tried about any version of LLaMA Chat

Besides, I haven't met the RuetimeError you provides before, I can provide related GPU, Nivida Driver and CUDA version:

A100, 470, 11.7

LIN-SHANG commented 1 month ago

It seems that this gap has been reduced a bit, you can try to adjust the historical window (from 5 to 12), this parameter has an impact on the best performance.

stddddd commented 1 month ago

the historical window has already been set to 12 in the previous two reproductions

It seems that this gap has been reduced a bit, you can try to adjust the historical window (from 5 to 12), this parameter has an impact on the best performance.

stddddd commented 1 month ago

This is my hyper-parameter setting while reproducing the model, what should I modify to improve the performance?

hyper-parameter IEMOCAP/MELD/EmoryNLP
GPU A100
Nvidia Driver 470
CUDA version 11.7
llm-model llama-2-7b-hf
experiment setting lora
historical window 12
accumulations 8
graphics card 4
speaker task None
domain base False
emotion prediction False
data percent 1.0
LR 2e-4
eval batch size 8
num train epochs 6
save steps 100000