PrabhjotKaurGosal / Helpful-scripts-for-MachineLearning

This repository contains various different scripts to help with Machine Learning projects
MIT License
25 stars 9 forks source link

hey,finally,i found you ,you are so busy,please help me #1

Open jackylee1 opened 1 week ago

jackylee1 commented 1 week ago

hey,finally,i found you ,you are so busy,please help me? when i implement the s2ut for your video tutorial,when i get the train.txt file. i have encountered the following errorhey, (test_fairseq) root@MS-TGCPQOCCPPUG:/home/anbanglee/Desktop/test_fairseq/fairseq# PYTHONPATH=. python examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py --feature_type hubert --kmeans_model_path /home/anbanglee/Desktop/S2ST/2_DiscretizeTargetSpeech_forTraining/km.bin --acoustic_model_path /home/anbanglee/Desktop/S2ST/2_DiscretizeTargetSpeech_forTraining/hubert_base_ls960.pt --layer 6 --manifest_path /home/anbanglee/Desktop/S2ST/TGT_AUDIO/dev/dev.tsv --out_quantized_file_path /home/anbanglee/Desktop/S2ST/TGT_AUDIO/dev.txt --extension ".wav" 2024-06-26 00:46:58 | INFO | main | Namespace(acoustic_model_path='/home/anbanglee/Desktop/S2ST/2_DiscretizeTargetSpeech_forTraining/hubert_base_ls960.pt', channel_id=None, extension='.wav', feature_type='hubert', features_path=None, hide_fname=False, kmeans_model_path='/home/anbanglee/Desktop/S2ST/2_DiscretizeTargetSpeech_forTraining/km.bin', layer=6, manifest_path='/home/anbanglee/Desktop/S2ST/TGT_AUDIO/dev/dev.tsv', out_quantized_file_path='/home/anbanglee/Desktop/S2ST/TGT_AUDIO/dev.txt') 2024-06-26 00:46:58 | INFO | main | Extracting hubert acoustic features... 2024-06-26 00:46:59 | INFO | fairseq.tasks.hubert_pretraining | current directory is /home/anbanglee/Desktop/test_fairseq/fairseq 2024-06-26 00:46:59 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': '/checkpoint/wnhsu/data/librispeech/960h/iter/250K_50hz_km100_mp0_65_v2', 'fine_tuning': False, 'labels': ['layer6.km500'], 'label_dir': None, 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} 2024-06-26 00:46:59 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': False, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'conv_pos_batch_norm': False, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': True} /root/anaconda3/envs/test_fairseq/lib/python3.8/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 0%| | 0/393 [00:00<?, ?it/s] Traceback (most recent call last): File "examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 141, in main(args, logger) File "examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 98, in main features_batch = get_features( File "/home/anbanglee/Desktop/test_fairseq/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 84, in get_features for features in tqdm.tqdm(iterator, total=num_files): File "/root/anaconda3/envs/test_fairseq/lib/python3.8/site-packages/tqdm/std.py", line 1181, in iter for obj in iterable: File "/home/anbanglee/Desktop/test_fairseq/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 64, in iterate feats = reader.get_feats(file_path, channel_id=channel_id) File "/home/anbanglee/Desktop/test_fairseq/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/hubert_feature_reader.py", line 51, in get_feats x = self.read_audio(file_path, ref_len, channel_id) File "/home/anbanglee/Desktop/test_fairseq/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/hubert_feature_reader.py", line 35, in read_audio wav, sr = sf.read(path) File "/root/anaconda3/envs/test_fairseq/lib/python3.8/site-packages/soundfile.py", line 285, in read with SoundFile(file, 'r', samplerate, channels, File "/root/anaconda3/envs/test_fairseq/lib/python3.8/site-packages/soundfile.py", line 658, in init self._file = self._open(file, mode_int, closefd) File "/root/anaconda3/envs/test_fairseq/lib/python3.8/site-packages/soundfile.py", line 1216, in _open raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name)) soundfile.LibsndfileError: Error opening '1590\t15158676295442294624.wav\tThe main local beer is \'Number One\', it is not a complex beer, but pleasant and refreshing. The other local beer is called "Manta".\tthe main local beer is number one\' it is not a complex beer but pleasant and refreshing the other local beer is called manta\tt h e | m a i n | l o c a l | b e e r | i s | n u m b e r | o n e \' | i t | i s | n o t | a | c o m p l e x | b e e r | b u t | p l e a s a n t | a n d | r e f r e s h i n g | t h e | o t h e r | l o c a l | b e e r | i s | c a l l e d | m a n t a |\t183360\tMALE/1590': System error.

i want to ask for your help ,could you please provide your dataset so that i will be grateful,thanks

PrabhjotKaurGosal commented 1 week ago

@jackylee1 - I used public dataset called Fleurs: https://huggingface.co/datasets/google/fleurs You do have to do some post processing after downloading the dataset. For example, make sure the filenames are same for both languages per sample.

jackylee1 commented 1 week ago

Thank you,any tutorial how to process the data?i did it 。after training,is the result used for audio translation?

---Original--- From: "Prabhjot @.> Date: Thu, Jun 27, 2024 10:44 AM To: @.>; Cc: @.**@.>; Subject: Re: [PrabhjotKaurGosal/Helpful-scripts-for-MachineLearning]hey,finally,i found you ,you are so busy,please help me (Issue #1)

@jackylee1 - I used public dataset called Fleurs: https://huggingface.co/datasets/google/fleurs You do have to do some post processing after downloading the dataset. For example, make sure the filenames are same for both languages per sample.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jackylee1 commented 1 week ago

the training process is long and large data,.so i terminate it. can i use one of the generated model to test. i have run the following,but it didn't generate resu lts for long time. (test_fairseq) root@MS-TGCPQOCCPPUG:/home/anbanglee/Desktop/test_fairseq/fairseq# fairseq-generate /home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/DATA_ROOT --config-yaml /home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/DATA_ROOT/config.yaml --task speech_to_speech --target-is-code --target-code-size 100 --vocoder code_hifigan --path /home/anbanglee/Desktop/S2ST/4_S2UT_training/checkpoint_best.pt --gen-subset test --max-tokens 50000 --beam 10 --max-len-a 1 --results-path /home/anbanglee/Desktop/S2ST/5_S2UT_Inference/RESULTS 2024-06-27 12:31:21 | INFO | fairseq_cli.generate | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'log_file': None, 'aim_repo': None, 'aim_run_hash': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': '/home/anbanglee/Desktop/S2ST/4_S2UT_training/checkpoint_best.pt', 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': '/home/anbanglee/Desktop/S2ST/5_S2UT_Inference/RESULTS'}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_num_procs': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'ddp_comm_hook': 'none', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_base_algorithm': 'localsgd', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': False, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'use_sharded_state': False, 'not_fsdp_flatten_parameters': False}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': 50000, 'batch_size': None, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 50000, 'batch_size_valid': None, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0, 'grouped_shuffling': False, 'update_epoch_batch_itr': False, 'update_ordered_indices_seed': False}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.25], 'stop_min_lr': -1.0, 'use_bmuf': False, 'skip_remainder_batch': False, 'debug_param_names': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'continue_once': None, 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 10, 'beam_mt': 0, 'nbest': 1, 'max_len_a': 1.0, 'max_len_b': 200, 'max_len_a_mt': 0.0, 'max_len_b_mt': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'lenpen_mt': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False, 'eos_token': None}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': {'_name': 'wav2vec2', 'extractor_mode': 'default', 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': 'gelu', 'layer_type': 'transformer', 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 0, 'layer_norm_first': False, 'conv_feature_layers': '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] + [(512,2,2)]', 'conv_bias': False, 'logit_temp': 0.1, 'quantize_targets': False, 'quantize_input': False, 'same_quantizer': False, 'target_glu': False, 'feature_grad_mult': 1.0, 'quantizer_depth': 1, 'quantizer_factor': 3, 'latent_vars': 320, 'latent_groups': 2, 'latent_dim': 0, 'mask_length': 10, 'mask_prob': 0.65, 'mask_selection': 'static', 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'require_same_masks': True, 'mask_dropout': 0.0, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_before': False, 'mask_channel_selection': 'static', 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'num_negatives': 100, 'negatives_from_everywhere': False, 'cross_sample_negatives': 0, 'codebook_negatives': 0, 'conv_pos': 128, 'conv_pos_groups': 16, 'pos_conv_depth': 1, 'latent_temp': [2.0, 0.5, 0.999995], 'max_positions': 100000, 'checkpoint_activations': False, 'required_seq_len_multiple': 1, 'crop_seq_to_multiple': 1, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False, 'adp_num': -1, 'adp_dim': 64, 'adp_act_fn': 'relu', 'adp_trf_idx': 'all'}, 'task': Namespace(_name='speech_to_speech', aim_repo=None, aim_run_hash=None, all_gather_list_size=16384, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, arch='wav2vec2', azureml_logging=False, batch_size=None, batch_size_valid=None, beam=10, beam_mt=0, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', combine_valid_subsets=None, config_yaml='/home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/DATA_ROOT/config.yaml', constraints=None, continue_once=None, cpu=False, cpu_offload=False, criterion='cross_entropy', curriculum=0, data='/home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/DATA_ROOT', data_buffer_size=10, dataset_impl=None, ddp_backend='pytorch_ddp', ddp_comm_hook='none', decoding_format=None, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_num_procs=1, distributed_port=-1, distributed_rank=0, distributed_world_size=1, diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, eos=2, eos_prob_threshold=0.5, eos_token=None, eval_args='{}', eval_inference=False, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, fp32_reduce_scatter=False, gen_subset='test', gradient_as_bucket_view=False, grouped_shuffling=False, heartbeat_timeout=-1, ignore_unused_valid_subsets=False, infer_target_lang='', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_interval_updates_pattern=-1, keep_last_epochs=-1, lenpen=1, lenpen_mt=1, lm_path=None, lm_weight=0.0, load_checkpoint_on_all_dp_ranks=False, localsgd_frequency=3, log_file=None, log_format=None, log_interval=100, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=1.0, max_len_a_mt=0, max_len_b=200, max_len_b_mt=200, max_source_positions=6000, max_target_positions=1024, max_tokens=50000, max_tokens_valid=50000, max_valid_steps=None, maximize_best_checkpoint_metric=False, mcd_normalize_type='targ', memory_efficient_bf16=False, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', model_parallel_size=1, multitask_config_yaml=None, n_frames_per_step=1, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_repeat_ngram_size=0, no_reshard_after_forward=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, not_fsdp_flatten_parameters=False, nprocs_per_node=1, num_shards=1, num_workers=1, on_cpu_convert_precision=False, optimizer=None, optimizer_overrides='{}', pad=1, path='/home/anbanglee/Desktop/S2ST/4_S2UT_training/checkpoint_best.pt', patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, plasma_path='/tmp/plasma', post_process=None, prefix_size=0, print_alignment=None, print_step=False, profile=False, quantization_config_path=None, quiet=False, replace_unk=None, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', results_path='/home/anbanglee/Desktop/S2ST/5_S2UT_Inference/RESULTS', retain_dropout=False, retain_dropout_modules=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, save_dir='checkpoints', save_interval=1, save_interval_updates=0, score_reference=False, scoring='bleu', seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, slowmo_base_algorithm='localsgd', slowmo_momentum=None, spec_bwd_max_iter=8, suppress_crashes=False, target_code_size=100, target_is_code=True, task='speech_to_speech', temperature=1.0, tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tpu=False, train_subset='train', unk=3, unkpen=0, unnormalized=False, update_epoch_batch_itr=False, update_ordered_indices_seed=False, use_plasma_view=False, use_sharded_state=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, vocoder='code_hifigan', wandb_project=None, warmup_updates=0, write_checkpoints_asynchronously=False, zero_sharding='none'), 'criterion': {'_name': 'cross_entropy', 'sentence_avg': True}, 'optimizer': None, 'lr_scheduler': {'_name': 'fixed', 'force_anneal': None, 'lr_shrink': 0.1, 'warmup_updates': 0, 'lr': [0.25]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq': 1, 'ema_fp32': False}} 2024-06-27 12:31:21 | INFO | fairseq.tasks.speech_to_speech | dictionary size: 104 2024-06-27 12:31:21 | INFO | fairseq_cli.generate | loading model(s) from /home/anbanglee/Desktop/S2ST/4_S2UT_training/checkpoint_best.pt 2024-06-27 12:31:21 | WARNING | fairseq.data.audio.data_cfg | Auto converting transforms into feature_transforms, but transforms will be deprecated in the future. Please update this in the config. 2024-06-27 12:31:21 | INFO | fairseq.data.audio.speech_to_text_dataset | 'test' has 0.00% OOV 2024-06-27 12:31:21 | INFO | fairseq.data.audio.speech_to_text_dataset | SpeechToSpeechDataset(split="test", n_samples=8, prepend_tgt_lang_tag=False, n_frames_per_step=1, shuffle=False, feature_transforms=CompositeAudioFeatureTransform( UtteranceCMVN(norm_means=True, norm_vars=True) ), waveform_transforms=None, dataset_transforms=CompositeAudioDatasetTransform( )) 2024-06-27 12:31:21 | INFO | fairseq.data.audio.speech_to_speech_dataset | SpeechToSpeechDataset(split="test", n_samples=8, prepend_tgt_lang_tag=False, n_frames_per_step=1, shuffle=False, feature_transforms=CompositeAudioFeatureTransform( UtteranceCMVN(norm_means=True, norm_vars=True) ), waveform_transforms=None, dataset_transforms=CompositeAudioDatasetTransform( )) 2024-06-27 12:31:21 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = True 2024-06-27 12:31:21 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True 2024-06-27 12:31:21 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False 2024-06-27 12:31:21 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1 0%| | 0/1 [00:00<?, ?it/s]

jackylee1 commented 1 week ago

PYTHONPATH=. python examples/speech_to_speech/preprocessing/prep_s2ut_data.py --source-dir /home/anbanglee/Desktop/S2ST/SRC_AUDIO --target-dir /home/anbanglee/Desktop/S2ST/TGT_AUDIO --data-split train dev test --output-root /home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/DATA_ROOT --reduce-unit --vocoder-checkpoint /home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/g_00500000 --vocoder-cfg /home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/vocoder_code_hifigan_hubert_base_100_lj_config.json the command here we will need the paired audio the prepare for the data ,but that is the tgt audio that i want to get. that is weird cause how can i do if i want to translate my own audio