felixbur / nkululeko

Machine learning speaker characteristics
MIT License
31 stars 5 forks source link

Support finetuning (with Transformers) #117

Closed bagustris closed 4 months ago

bagustris commented 5 months ago

Nowadays, fine-tuning is more prominent in speech processing by taking advantage of other already trained models. Current Nkululeko only used pre-trained model as feature extractor. It would be very useful if Nkululeko could do finetuning just in one command with a given INI config file (ref [1]).

Possible Solution
Either make finetuning as new key in [model] category (inside INI file) or create new module, e.g., nkululeko.finetuning.
required arguments: base_model (or from_model), push_to_hub (later?) optional arguments: learning_rate, epochs, batch_size, etc (maybe use default first)

The biggest challenge may be to connect Nkululeko's own (CSV) dataset with Transformers [2], since Transformers finetuning only accept audio in HF Dataset format (just the format, no need to upload the dataset to the Hub).

[1] https://huggingface.co/learn/audio-course/en/chapter4/fine-tuning [2] https://discuss.huggingface.co/t/loading-custom-audio-dataset-and-fine-tuning-model/8836/5

felixbur commented 5 months ago

agreed

felixbur commented 4 months ago

first version implemented: 0.85.0 only classification so far