Srijith-rkr / Whispering-LLaMA

EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
MIT License
232 stars 16 forks source link

weights for Adapter #11

Closed rrscholarship closed 5 months ago

rrscholarship commented 5 months ago

Not sure if you have pretrained adapter weights? for some reason I am not able to train adapter in resource constraint scenarios. Training the Adapter needs large amount of resources, why is that?

Another question, if I would like to use other dataset, how do I get array of that audio? array([0.0005188 , 0.00085449, 0.00012207, ..., 0.00125122, 0.00076294, 0.00036621]. how is it calcualted?

'audio':
{
# in streaming mode 'path' will be 'xs_chunks_0000/YOU0000000315_S0000660.wav'
'path': '/home/user/.cache/huggingface/datasets/downloads/extracted/9d48cf31/xs_chunks_0000/YOU0000000315_S0000660.wav',
'array': array([0.0005188 , 0.00085449, 0.00012207, ..., 0.00125122, 0.00076294, 0.00036621], dtype=float32),
'sampling_rate': 16000
},

Thanks in advance

Srijith-rkr commented 5 months ago

Hi,

The array of audio is the decoded audio input (It is not calculated). It can be passed as input to the Whisper model to generate n-best hypotheses and audio encoding (output of Whisper Encoder) to be used by the setup . You can go through the code in the data_preparation directory to use other datasets with this setup. Also refer to this repo (also contains code for generating n-best hypotheses).

And 1 GPU with 24GB might not be enough to train LLaMA (7B) with adapters. I used 1 A100 (80GB VRAM) to train, you can also use multiple smaller GPUs with FSDP.

I did not attach the pretrained adapter weights, I don't have access to KAUST compute anymore. I will check with my friends and try to add it.