PrabhjotKaurGosal / Helpful-scripts-for-MachineLearning

This repository contains various different scripts to help with Machine Learning projects
MIT License
26 stars 9 forks source link

hey,finally,i found you ,you are so busy,please help me #1

Open jackylee1 opened 5 months ago

jackylee1 commented 5 months ago

hey,finally,i found you ,you are so busy,please help me? when i implement the s2ut for your video tutorial,when i get the train.txt file. i have encountered the following errorhey, (test_fairseq) root@MS-TGCPQOCCPPUG:/home/anbanglee/Desktop/test_fairseq/fairseq# PYTHONPATH=. python examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py --feature_type hubert --kmeans_model_path /home/anbanglee/Desktop/S2ST/2_DiscretizeTargetSpeech_forTraining/km.bin --acoustic_model_path /home/anbanglee/Desktop/S2ST/2_DiscretizeTargetSpeech_forTraining/hubert_base_ls960.pt --layer 6 --manifest_path /home/anbanglee/Desktop/S2ST/TGT_AUDIO/dev/dev.tsv --out_quantized_file_path /home/anbanglee/Desktop/S2ST/TGT_AUDIO/dev.txt --extension ".wav" 2024-06-26 00:46:58 | INFO | main | Namespace(acoustic_model_path='/home/anbanglee/Desktop/S2ST/2_DiscretizeTargetSpeech_forTraining/hubert_base_ls960.pt', channel_id=None, extension='.wav', feature_type='hubert', features_path=None, hide_fname=False, kmeans_model_path='/home/anbanglee/Desktop/S2ST/2_DiscretizeTargetSpeech_forTraining/km.bin', layer=6, manifest_path='/home/anbanglee/Desktop/S2ST/TGT_AUDIO/dev/dev.tsv', out_quantized_file_path='/home/anbanglee/Desktop/S2ST/TGT_AUDIO/dev.txt') 2024-06-26 00:46:58 | INFO | main | Extracting hubert acoustic features... 2024-06-26 00:46:59 | INFO | fairseq.tasks.hubert_pretraining | current directory is /home/anbanglee/Desktop/test_fairseq/fairseq 2024-06-26 00:46:59 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': '/checkpoint/wnhsu/data/librispeech/960h/iter/250K_50hz_km100_mp0_65_v2', 'fine_tuning': False, 'labels': ['layer6.km500'], 'label_dir': None, 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} 2024-06-26 00:46:59 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': False, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'conv_pos_batch_norm': False, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': True} /root/anaconda3/envs/test_fairseq/lib/python3.8/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 0%| | 0/393 [00:00<?, ?it/s] Traceback (most recent call last): File "examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 141, in main(args, logger) File "examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 98, in main features_batch = get_features( File "/home/anbanglee/Desktop/test_fairseq/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 84, in get_features for features in tqdm.tqdm(iterator, total=num_files): File "/root/anaconda3/envs/test_fairseq/lib/python3.8/site-packages/tqdm/std.py", line 1181, in iter for obj in iterable: File "/home/anbanglee/Desktop/test_fairseq/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 64, in iterate feats = reader.get_feats(file_path, channel_id=channel_id) File "/home/anbanglee/Desktop/test_fairseq/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/hubert_feature_reader.py", line 51, in get_feats x = self.read_audio(file_path, ref_len, channel_id) File "/home/anbanglee/Desktop/test_fairseq/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/hubert_feature_reader.py", line 35, in read_audio wav, sr = sf.read(path) File "/root/anaconda3/envs/test_fairseq/lib/python3.8/site-packages/soundfile.py", line 285, in read with SoundFile(file, 'r', samplerate, channels, File "/root/anaconda3/envs/test_fairseq/lib/python3.8/site-packages/soundfile.py", line 658, in init self._file = self._open(file, mode_int, closefd) File "/root/anaconda3/envs/test_fairseq/lib/python3.8/site-packages/soundfile.py", line 1216, in _open raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name)) soundfile.LibsndfileError: Error opening '1590\t15158676295442294624.wav\tThe main local beer is \'Number One\', it is not a complex beer, but pleasant and refreshing. The other local beer is called "Manta".\tthe main local beer is number one\' it is not a complex beer but pleasant and refreshing the other local beer is called manta\tt h e | m a i n | l o c a l | b e e r | i s | n u m b e r | o n e \' | i t | i s | n o t | a | c o m p l e x | b e e r | b u t | p l e a s a n t | a n d | r e f r e s h i n g | t h e | o t h e r | l o c a l | b e e r | i s | c a l l e d | m a n t a |\t183360\tMALE/1590': System error.

i want to ask for your help ,could you please provide your dataset so that i will be grateful,thanks

PrabhjotKaurGosal commented 5 months ago

@jackylee1 - I used public dataset called Fleurs: https://huggingface.co/datasets/google/fleurs You do have to do some post processing after downloading the dataset. For example, make sure the filenames are same for both languages per sample.

jackylee1 commented 5 months ago

Thank you,any tutorial how to process the data?i did it 。after training,is the result used for audio translation?

---Original--- From: "Prabhjot @.> Date: Thu, Jun 27, 2024 10:44 AM To: @.>; Cc: @.**@.>; Subject: Re: [PrabhjotKaurGosal/Helpful-scripts-for-MachineLearning]hey,finally,i found you ,you are so busy,please help me (Issue #1)

@jackylee1 - I used public dataset called Fleurs: https://huggingface.co/datasets/google/fleurs You do have to do some post processing after downloading the dataset. For example, make sure the filenames are same for both languages per sample.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jackylee1 commented 5 months ago

the training process is long and large data,.so i terminate it. can i use one of the generated model to test. i have run the following,but it didn't generate resu lts for long time. (test_fairseq) root@MS-TGCPQOCCPPUG:/home/anbanglee/Desktop/test_fairseq/fairseq# fairseq-generate /home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/DATA_ROOT --config-yaml /home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/DATA_ROOT/config.yaml --task speech_to_speech --target-is-code --target-code-size 100 --vocoder code_hifigan --path /home/anbanglee/Desktop/S2ST/4_S2UT_training/checkpoint_best.pt --gen-subset test --max-tokens 50000 --beam 10 --max-len-a 1 --results-path /home/anbanglee/Desktop/S2ST/5_S2UT_Inference/RESULTS 2024-06-27 12:31:21 | INFO | fairseq_cli.generate | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'log_file': None, 'aim_repo': None, 'aim_run_hash': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': '/home/anbanglee/Desktop/S2ST/4_S2UT_training/checkpoint_best.pt', 'post_process': None, 'quiet': False, 'model_overrides': '{}', 'results_path': '/home/anbanglee/Desktop/S2ST/5_S2UT_Inference/RESULTS'}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_num_procs': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'pytorch_ddp', 'ddp_comm_hook': 'none', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_base_algorithm': 'localsgd', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': False, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'use_sharded_state': False, 'not_fsdp_flatten_parameters': False}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': 50000, 'batch_size': None, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 50000, 'batch_size_valid': None, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0, 'grouped_shuffling': False, 'update_epoch_batch_itr': False, 'update_ordered_indices_seed': False}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.25], 'stop_min_lr': -1.0, 'use_bmuf': False, 'skip_remainder_batch': False, 'debug_param_names': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'continue_once': None, 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 10, 'beam_mt': 0, 'nbest': 1, 'max_len_a': 1.0, 'max_len_b': 200, 'max_len_a_mt': 0.0, 'max_len_b_mt': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'lenpen_mt': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False, 'eos_token': None}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'model': {'_name': 'wav2vec2', 'extractor_mode': 'default', 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': 'gelu', 'layer_type': 'transformer', 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 0, 'layer_norm_first': False, 'conv_feature_layers': '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] + [(512,2,2)]', 'conv_bias': False, 'logit_temp': 0.1, 'quantize_targets': False, 'quantize_input': False, 'same_quantizer': False, 'target_glu': False, 'feature_grad_mult': 1.0, 'quantizer_depth': 1, 'quantizer_factor': 3, 'latent_vars': 320, 'latent_groups': 2, 'latent_dim': 0, 'mask_length': 10, 'mask_prob': 0.65, 'mask_selection': 'static', 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'require_same_masks': True, 'mask_dropout': 0.0, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_before': False, 'mask_channel_selection': 'static', 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'num_negatives': 100, 'negatives_from_everywhere': False, 'cross_sample_negatives': 0, 'codebook_negatives': 0, 'conv_pos': 128, 'conv_pos_groups': 16, 'pos_conv_depth': 1, 'latent_temp': [2.0, 0.5, 0.999995], 'max_positions': 100000, 'checkpoint_activations': False, 'required_seq_len_multiple': 1, 'crop_seq_to_multiple': 1, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False, 'adp_num': -1, 'adp_dim': 64, 'adp_act_fn': 'relu', 'adp_trf_idx': 'all'}, 'task': Namespace(_name='speech_to_speech', aim_repo=None, aim_run_hash=None, all_gather_list_size=16384, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, arch='wav2vec2', azureml_logging=False, batch_size=None, batch_size_valid=None, beam=10, beam_mt=0, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='', combine_valid_subsets=None, config_yaml='/home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/DATA_ROOT/config.yaml', constraints=None, continue_once=None, cpu=False, cpu_offload=False, criterion='cross_entropy', curriculum=0, data='/home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/DATA_ROOT', data_buffer_size=10, dataset_impl=None, ddp_backend='pytorch_ddp', ddp_comm_hook='none', decoding_format=None, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_num_procs=1, distributed_port=-1, distributed_rank=0, distributed_world_size=1, diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, eos=2, eos_prob_threshold=0.5, eos_token=None, eval_args='{}', eval_inference=False, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, fp32_reduce_scatter=False, gen_subset='test', gradient_as_bucket_view=False, grouped_shuffling=False, heartbeat_timeout=-1, ignore_unused_valid_subsets=False, infer_target_lang='', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_interval_updates_pattern=-1, keep_last_epochs=-1, lenpen=1, lenpen_mt=1, lm_path=None, lm_weight=0.0, load_checkpoint_on_all_dp_ranks=False, localsgd_frequency=3, log_file=None, log_format=None, log_interval=100, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=1.0, max_len_a_mt=0, max_len_b=200, max_len_b_mt=200, max_source_positions=6000, max_target_positions=1024, max_tokens=50000, max_tokens_valid=50000, max_valid_steps=None, maximize_best_checkpoint_metric=False, mcd_normalize_type='targ', memory_efficient_bf16=False, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', model_parallel_size=1, multitask_config_yaml=None, n_frames_per_step=1, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_repeat_ngram_size=0, no_reshard_after_forward=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, not_fsdp_flatten_parameters=False, nprocs_per_node=1, num_shards=1, num_workers=1, on_cpu_convert_precision=False, optimizer=None, optimizer_overrides='{}', pad=1, path='/home/anbanglee/Desktop/S2ST/4_S2UT_training/checkpoint_best.pt', patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, plasma_path='/tmp/plasma', post_process=None, prefix_size=0, print_alignment=None, print_step=False, profile=False, quantization_config_path=None, quiet=False, replace_unk=None, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', results_path='/home/anbanglee/Desktop/S2ST/5_S2UT_Inference/RESULTS', retain_dropout=False, retain_dropout_modules=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, save_dir='checkpoints', save_interval=1, save_interval_updates=0, score_reference=False, scoring='bleu', seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, slowmo_base_algorithm='localsgd', slowmo_momentum=None, spec_bwd_max_iter=8, suppress_crashes=False, target_code_size=100, target_is_code=True, task='speech_to_speech', temperature=1.0, tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tpu=False, train_subset='train', unk=3, unkpen=0, unnormalized=False, update_epoch_batch_itr=False, update_ordered_indices_seed=False, use_plasma_view=False, use_sharded_state=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, vocoder='code_hifigan', wandb_project=None, warmup_updates=0, write_checkpoints_asynchronously=False, zero_sharding='none'), 'criterion': {'_name': 'cross_entropy', 'sentence_avg': True}, 'optimizer': None, 'lr_scheduler': {'_name': 'fixed', 'force_anneal': None, 'lr_shrink': 0.1, 'warmup_updates': 0, 'lr': [0.25]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq': 1, 'ema_fp32': False}} 2024-06-27 12:31:21 | INFO | fairseq.tasks.speech_to_speech | dictionary size: 104 2024-06-27 12:31:21 | INFO | fairseq_cli.generate | loading model(s) from /home/anbanglee/Desktop/S2ST/4_S2UT_training/checkpoint_best.pt 2024-06-27 12:31:21 | WARNING | fairseq.data.audio.data_cfg | Auto converting transforms into feature_transforms, but transforms will be deprecated in the future. Please update this in the config. 2024-06-27 12:31:21 | INFO | fairseq.data.audio.speech_to_text_dataset | 'test' has 0.00% OOV 2024-06-27 12:31:21 | INFO | fairseq.data.audio.speech_to_text_dataset | SpeechToSpeechDataset(split="test", n_samples=8, prepend_tgt_lang_tag=False, n_frames_per_step=1, shuffle=False, feature_transforms=CompositeAudioFeatureTransform( UtteranceCMVN(norm_means=True, norm_vars=True) ), waveform_transforms=None, dataset_transforms=CompositeAudioDatasetTransform( )) 2024-06-27 12:31:21 | INFO | fairseq.data.audio.speech_to_speech_dataset | SpeechToSpeechDataset(split="test", n_samples=8, prepend_tgt_lang_tag=False, n_frames_per_step=1, shuffle=False, feature_transforms=CompositeAudioFeatureTransform( UtteranceCMVN(norm_means=True, norm_vars=True) ), waveform_transforms=None, dataset_transforms=CompositeAudioDatasetTransform( )) 2024-06-27 12:31:21 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = True 2024-06-27 12:31:21 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True 2024-06-27 12:31:21 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False 2024-06-27 12:31:21 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1 0%| | 0/1 [00:00<?, ?it/s]

jackylee1 commented 4 months ago

PYTHONPATH=. python examples/speech_to_speech/preprocessing/prep_s2ut_data.py --source-dir /home/anbanglee/Desktop/S2ST/SRC_AUDIO --target-dir /home/anbanglee/Desktop/S2ST/TGT_AUDIO --data-split train dev test --output-root /home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/DATA_ROOT --reduce-unit --vocoder-checkpoint /home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/g_00500000 --vocoder-cfg /home/anbanglee/Desktop/S2ST/3_S2UT_FormattingData/vocoder_code_hifigan_hubert_base_100_lj_config.json the command here we will need the paired audio the prepare for the data ,but that is the tgt audio that i want to get. that is weird cause how can i do if i want to translate my own audio

jackylee1 commented 3 months ago

hey,you are so busy, will you update youtube on how to process the data

jackylee1 commented 3 months ago

@jackylee1 - I used public dataset called Fleurs: https://huggingface.co/datasets/google/fleurs You do have to do some post processing after downloading the dataset. For example, make sure the filenames are same for both languages per sample.

Thanks for your work ,will you update your YouTube channel about how to process the data for training