felixbur / nkululeko

Machine learning speaker characteristics
MIT License
26 stars 4 forks source link

Hubert embeddings #56

Closed felixbur closed 10 months ago

felixbur commented 10 months ago

would be nice to add them for comparison to wav2vec 2

bagustris commented 10 months ago

Submitted PR for this issue #57.

We need to think about features from transformers. Since there are a lot of features and mostly the recent ones, it will be better to design how we input acoustic features from HF (Hugging Face/Transformers). Once we have a very good design, it will be easy to take input from HF and plug it into Nkululeko. This will make Nkululeko the most comfortable toolkit to test newest speech/audio foundation model to the newest speech dataset (first goal of Nkululeko).

Currently, I just follow the wav2vec2 models and modify them to receive different variants. My thought is just to grab the name of the speech model from HF and put it directly in the INI file.

This approach is similar to what s3prl does. An example of their HF Hubert implementation is here.