Closed shihuai closed 2 days ago
Can you show the config of your training?
Can you show the config of your training?
I use the hubert_base_librispeech.yaml for pretraining, only change the ddp_backend and max_sample_size.
common:
fp16: true
log_format: json
log_interval: 200
seed: 1337
tensorboard_logdir: tblog
checkpoint:
save_interval_updates: 25000
keep_interval_updates: 1
no_epoch_checkpoints: true
distributed_training:
ddp_backend: c10d
distributed_backend: 'nccl'
distributed_world_size: 4
distributed_port: 29671
nprocs_per_node: 4
find_unused_parameters: true
task:
_name: hubert_pretraining
data: ${task.data}
label_dir: ${task.label_dir}
labels: ${task.labels}
label_rate: ${model.label_rate}
sample_rate: 16000
max_sample_size: 320000 #250000
min_sample_size: 32000
pad_audio: false
random_crop: true
normalize: false # must be consistent with extractor
dataset:
num_workers: 6
max_tokens: 1400000
skip_invalid_size_inputs_valid_test: true
validate_interval: 5
validate_interval_updates: 10000
criterion:
_name: hubert
pred_masked_weight: 1.0
pred_nomask_weight: 0.0
loss_weights: [10,]
optimization:
max_update: 400000
lr: [0.00025]
clip_norm: 10.0
optimizer:
_name: adam
adam_betas: (0.9,0.98)
adam_eps: 1e-06
weight_decay: 0.01
lr_scheduler:
_name: polynomial_decay
warmup_updates: 32000
model:
_name: hubert
label_rate: 100
skip_masked: false
skip_nomask: false
mask_prob: 0.80
extractor_mode: default
conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
final_dim: 256
encoder_layerdrop: 0.05
dropout_input: 0.1
dropout_features: 0.1
dropout: 0.1
attention_dropout: 0.1
feature_grad_mult: 0.1
untie_final_proj: true
activation_dropout: 0.0
hydra:
job:
config:
override_dirname:
kv_sep: '-'
item_sep: '__'
exclude_keys:
- run
- task.data
- task.label_dir
run:
dir: ???
sweep:
dir: ???
subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}
The loss of training hubert on my side can eventually converge to around 2.5, and I used the wenetspeech dataset as the pretrain dataset,which used 10,000 hours of pure Chinese data.
We believe that the key of training hubert base model is to look at the performance of the pre-trained model on main downstream tasks. You can finetune the pre-trained model trained by your recipe, and then test its accuracy on your tasks.
We believe that the key of training hubert base model is to look at the performance of the pre-trained model on main downstream tasks. You can finetune the pre-trained model trained by your recipe, and then test its accuracy on your tasks.
OK, Thank you for your reply! We have tried to train the SpeechTokenizer with feature from Hubert, and the reconstructed speech is also good. We will try more experiments on downstream tasks.
We believe that the key of training hubert base model is to look at the performance of the pre-trained model on main downstream tasks. You can finetune the pre-trained model trained by your recipe, and then test its accuracy on your tasks.
OK, Thank you for your reply! We have tried to train the SpeechTokenizer with feature from Hubert, and the reconstructed speech is also good. We will try more experiments on downstream tasks.
Will you open source the checkpoint? I think it will be very helpful to the community.
We believe that the key of training hubert base model is to look at the performance of the pre-trained model on main downstream tasks. You can finetune the pre-trained model trained by your recipe, and then test its accuracy on your tasks.
OK, Thank you for your reply! We have tried to train the SpeechTokenizer with feature from Hubert, and the reconstructed speech is also good. We will try more experiments on downstream tasks.
I'm doing similar work now. Could you send me your configuration for reference?
We believe that the key of training hubert base model is to look at the performance of the pre-trained model on main downstream tasks. You can finetune the pre-trained model trained by your recipe, and then test its accuracy on your tasks.
OK, Thank you for your reply! We have tried to train the SpeechTokenizer with feature from Hubert, and the reconstructed speech is also good. We will try more experiments on downstream tasks.
Will you open source the checkpoint? I think it will be very helpful to the community.
Yes, we are writing the paper, after we will open source our work.
Hi~I'm trying pretrain hubert from scratch on english and chinese speech dataset. During pretrain, the 1st iteration loss dropped from 6.7 to 3.3, the 2nd iteration loss dropped from 11.2 to 4.0. Both of two stage iterations loss are too large, is this a normal phenomenon?