clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.03k stars 273 forks source link

Inquiries on multi-modal data loader #151

Closed ClaudiaShu closed 2 years ago

ClaudiaShu commented 2 years ago

Hi, thank you for your amazing work. I'm wondering whether there is an instruction for loading both the image face frames as well as the speech segments.

Jungjee commented 2 years ago

Hi, as of current, this repo only covers the audio part.