LeMei / UniMSE

169 stars 24 forks source link

请问一下最终输入至T5模型是只用两个模态:语音和视频吗,我看你的dataloader, 在文本这一项输入至模型的是一个空值 #30

Closed KindredSpirithub closed 1 year ago

KindredSpirithub commented 1 year ago
31733fa767187e62607cd38d800c546