YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
389 stars 36 forks source link

Modifications to the llama model #36

Open peggyxpxu opened 6 months ago

peggyxpxu commented 6 months ago

Hi,sir: You mentioned “only output text related states ”on line 734 in modeling_llama.py,. And only use text states in the next processing. On line 733 you did the same thing,only use text states before return the results. What is the reason for doing this?

YuanGongND commented 4 months ago

hi there,

I think the reason is, we do not need to apply loss on the output of the model w.r.t. the input audio tokens, we only care about text tokens, and wish to add cross-entropy loss on top of it.

-Yuan