add linguistics - Githubissues

Some datasets already have transcriptions (but I skip that since I don't think it will be needed). It can be added as an additional column in the CSV or audformat. If there is no transcription, we can utilize hugging face (such as a whisper) to generate transcripts during pre-processing in each dataset. Then, the "linguistic feature extractor" will process transcription in the transcript column (I propose this name as the header of transcription) to generate word embeddings (linguistic feature).

This is useful to use speech along with transcription for the detection of such degradation like Alzheimer's.

felixbur / nkululeko

add linguistics #94