felixbur / nkululeko

Machine learning speaker characteristics
MIT License
31 stars 5 forks source link

add linguistics #94

Open felixbur opened 10 months ago

felixbur commented 10 months ago

Nkululeko could be multimodal, if a transcript field is added to the audio files and then, linguistic features extractors could be added to the feature_sets

bagustris commented 10 months ago

Some datasets already have transcriptions (but I skip that since I don't think it will be needed). It can be added as an additional column in the CSV or audformat. If there is no transcription, we can utilize hugging face (such as a whisper) to generate transcripts during pre-processing in each dataset. Then, the "linguistic feature extractor" will process transcription in the transcript column (I propose this name as the header of transcription) to generate word embeddings (linguistic feature).

This is useful to use speech along with transcription for the detection of such degradation like Alzheimer's.