facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.94k stars 1.06k forks source link

Indicate the SONAR device in the mutox example and explain the dataset columns #412

Closed avidale closed 6 months ago

avidale commented 7 months ago

Why

  1. I heard complaints from external users that MUTOX is slow. And it was just because they didn't know that SONAR encoder by default loads to CPU.
  2. There are questions (e.g. https://github.com/facebookresearch/seamless_communication/issues/428) on how to interpret the predicted MuTox scores and how to load the audio segments in the MuTox dataset.

How

  1. I pass device to the SONAR encoder constructor in our example notebook, so that if there is a GPU, it is used.
  2. I add some more explanations of the predicted scores and dataset in the MuTox readme.