Closed felixbur closed 10 months ago
Submitted PR for this issue #57.
We need to think about features from transformers. Since there are a lot of features and mostly the recent ones, it will be better to design how we input acoustic features from HF (Hugging Face/Transformers). Once we have a very good design, it will be easy to take input from HF and plug it into Nkululeko. This will make Nkululeko the most comfortable toolkit to test newest speech/audio foundation model to the newest speech dataset (first goal of Nkululeko).
Currently, I just follow the wav2vec2 models and modify them to receive different variants. My thought is just to grab the name of the speech model from HF and put it directly in the INI file.
This approach is similar to what s3prl does. An example of their HF Hubert implementation is here.
would be nice to add them for comparison to wav2vec 2