YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
337 stars 27 forks source link

extract_whisper_feature.py #15

Open dingdongwang opened 5 months ago

dingdongwang commented 5 months ago

Hello, I would like to ask about the following 2 questions:

  1. If there if any shell scipt to run extract_whisper_feature.py? since I don't know what is the parameters of the following code shown in extract_whisper_feature.py:

    import sys
    argument = sys.argv[1]
  2. If there have the example json dataset that could run this extract_whisper_feature.py. Since I couldn't find the followinw all_mix_disjoint_audio_list.json in the repo.

    esc_train1 = '/data/sls/scratch/yuangong/audiollm/src/data/prep_data_ltue/speech_qa/all_mix_disjoint_audio_list.json'

Thank you!

YuanGongND commented 5 months ago

If there if any shell scipt to run extract_whisper_feature.py? since I don't know what is the parameters of the following code shown in extract_whisper_feature.py: import sys argument = sys.argv[1]

This is just the cuda device, i.e., if you have 4 gpus, you can create 4 threads with args of 0,1,2,3. If you have only one gpu, then just set to 0. We do not shell wrapper for this.

If there have the example json dataset that could run this extract_whisper_feature.py. Since I couldn't find the followinw all_mix_disjoint_audio_list.json in the repo. esc_train1 = '/data/sls/scratch/yuangong/audiollm/src/data/prep_data_ltue/speech_qa/all_mix_disjoint_audio_list.json'

This is just a list of wav file path. See this for an example: all_mix_disjoint_audio_list.json

YuanGongND commented 5 months ago

This script is easy to read. I think the best way to understand it is just to read it.

dingdongwang commented 5 months ago

Got it! Thank you!