FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model
https://funaudiollm.github.io/
Other
3.49k stars 317 forks source link

按操作文档finetune报错: styles = torch.LongTensor([[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]).to(speech.device) IndexError: index 3 is out of bounds for dimension 1 with size 1 #158

Open eatoncys opened 3 weeks ago

eatoncys commented 3 weeks ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run cmd 'bash finetune.sh'
  2. See error

Traceback (most recent call last):

[2024-11-01 20:33:48,933][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 5, after: 5 [2024-11-01 20:33:48,963][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 5, after: 5 [2024-11-01 20:33:48,989][root][ERROR] - ERROR: data is empty! [2024-11-01 20:33:51,222][root][ERROR] - ERROR: data is empty! Error executing job with overrides: ['++model=/mnt/home/sensevoice/SenseVoiceSmall', '++trust_remote_code=true', '++train_data_set_list=/mnt/home/sensevoice/train_data/datasets/asr_dataset.jsonl', '++valid_data_set_list=/mnt/home/sensevoice/train_data/datasets/asr_val.jsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=10', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=1', '++train_conf.max_epoch=50', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=2000', '++train_conf.keep_nbest_models=20', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/mnt/home/sensevoice/SenseVoice-main/deepspeed_conf/ds_stage1.json', '++optim_conf.lr=0.0002', '++output_dir=./outputs'] Traceback (most recent call last): File "/mnt/home/sensevoice/FunASR-main/funasr/bin/train_ds.py", line 225, in main_hydra() File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main _run_hydra( File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report raise ex File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report return func() File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/internal/hydra.py", line 132, in run = ret.return_value File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/mnt/home/sensevoice/FunASR-main/funasr/bin/train_ds.py", line 56, in main_hydra main(kwargs) File "/mnt/home/sensevoice/FunASR-main/funasr/bin/train_ds.py", line 173, in main trainer.train_epoch( File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/funasr/train_utils/trainer_ds.py", line 603, in train_epoch self.forward_step(model, batch, loss_dict=loss_dict) File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/funasr/train_utils/trainer_ds.py", line 670, in forward_step retval = model(batch) File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/miniconda3/envs/svnew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/mnt/home/sensevoice/SenseVoice-main/./model.py", line 680, in forward encoder_out, encoder_out_lens = self.encode(speech, speech_lengths, text) File "/mnt/home/sensevoice/SenseVoice-main/./model.py", line 733, in encode styles = torch.LongTensor([[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]).to(speech.device) IndexError: index 3 is out of bounds for dimension 1 with size 1**

数据集使用样例数据,sensevoice2jsonl转换后: {"key": "BAC009S0764W0121", "source": "/mnt/home/sensevoice/data_example/voice/BAC009S0764W0121.wav", "source_len": 420, "target": "甚至出现交易几乎停滞的情况", "target_len": 13, "with_or_wo_itn": "<|woitn|>", "text_language": "<|zh|>", "emo_target": "<|NEUTRAL|>", "event_target": "<|Speech|>"} {"key": "BAC009S0916W0489", "source": "/mnt/home/sensevoice/data_example/voice/BAC009S0916W0489.wav", "source_len": 573, "target": "湖北一公司以员工名义贷款数十员工负债千万", "target_len": 20, "with_or_wo_itn": "<|woitn|>", "text_language": "<|zh|>", "emo_target": "<|NEUTRAL|>", "event_target": "<|Speech|>"} {"key": "asr_example_cn_en", "source": "/mnt/home/sensevoice/data_example/voice/asr_example_cn_en.wav", "source_len": 1474, "target": "所有只要处理 data 不管你是做 machine learning 做 deep learning 做 data analytics 做 data science 也好 scientist 也好通通都要都做的基本功啊那 again 先先对有一些也许对", "target_len": 19, "with_or_wo_itn": "<|woitn|>", "text_language": "<|zh|>", "emo_target": "<|NEUTRAL|>", "event_target": "<|Speech|>"} {"key": "ID0012W0014", "source": "/mnt/home/sensevoice/data_example/voice/asr_example_en.wav", "source_len": 222, "target": "he tried to think how it could be", "target_len": 8, "with_or_wo_itn": "<|woitn|>", "text_language": "<|en|>", "emo_target": "<|EMO_UNKNOWN|>", "event_target": "<|Speech|>"}

Code sample

Expected behavior

Environment

Additional context

JonneryR commented 4 days ago

我也遇到了这个问题。 他的代码看起来要对text去做padding,但我暂时没找到这个padding的代码在哪里,好像得自己补充了。

qiuqiu-879 commented 4 days ago

我的代码里我把他脚本的batchsampler关掉了就解决了,我当时的解决办法是我先用一条数据去看,然后发现他batch里对同一条数据采的不一致,然后batchsampler那条注释掉就能运行了

------------------ 原始邮件 ------------------ 发件人: JonneryR @.> 发送时间: 2024年11月18日 10:46 收件人: FunAudioLLM/SenseVoice @.> 抄送: qiuqiu-879 @.>, Comment @.> 主题: Re: [FunAudioLLM/SenseVoice] 按操作文档finetune报错: styles = torch.LongTensor([[self.textnorm_int_dict[int(style)]] for style in text[:, 3]]).to(speech.device) IndexError: index 3 is out of bounds for dimension 1 with size 1 (Issue #158)

我也遇到了这个问题。 他的代码看起来要对text去做padding,但我暂时没找到这个padding的代码在哪里,好像得自己补充了。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

JonneryR commented 4 days ago

问题解决了,是dataset的选择问题,需要选择SenseVoiceCTCDataset,只有这里才有给text前面做padding的代码。