YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
337 stars 27 forks source link

Modifications to the llama model #36

Open peggyxpxu opened 1 month ago

peggyxpxu commented 1 month ago

Hi,sir: You mentioned “only output text related states ”on line 734 in modeling_llama.py,. And only use text states in the next processing. On line 733 you did the same thing,only use text states before return the results. What is the reason for doing this?

YuanGongND commented 2 weeks ago

hi there,

I think the reason is, we do not need to apply loss on the output of the model w.r.t. the input audio tokens, we only care about text tokens, and wish to add cross-entropy loss on top of it.

-Yuan