CNChTu / Diffusion-SVC

MIT License
418 stars 59 forks source link

Units index - not working [?] #79

Open A-2-H opened 1 month ago

A-2-H commented 1 month ago

I can't make Units Intex to work... I had problem to train it, and found out that I need to make folder for speaker "1" in train/audio and val/audio named "1" and after that I could generate index file but now when I try to use -ir nothing happen. It loads index as it should and then nothing happen just stop the script. Maybe I do something wrong with training index or something is not working with inferancing it, maybe bad packages or something is messed up in infer files?

E:\Diffusion-SVC>python train_units_index.py -c configs/config_v2_reflow.yaml
 [INFO] The feature index is constructing.
 [INFO] Loading the Units files...
100%|██████████████████████████████████████████████████████████████████████████████████████████| 237/237 [00:00<00:00, 1128.57it/s]
 [INFO] Training the Units indexes...
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:01<00:00,  8.62it/s]
 [INFO] Successfully build index
E:\Diffusion-SVC>python main.py -i "D:\input\test.mp3" -model "E:\Diffusion-SVC\exp\test\model_100000.pt" -o "E:\Diffusion-SVC\output\test.wav" -k 0 -id 1 -method euler -ir 0.75 -pe rmvpe
 [INFO] Not compile this model.
 [Loading] E:\Diffusion-SVC\exp\test\model_100000.pt
 [Encoder Model] Content Vec
 [Loading] pretrain/contentvec/checkpoint_best_legacy_500.pt
s:\anaconda3\envs\diff\lib\site-packages\fairseq\checkpoint_utils.py:315: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state = torch.load(f, map_location=torch.device("cpu"))
2024-10-19 22:34:59 | INFO | fairseq.tasks.hubert_pretraining | current directory is E:\Diffusion-SVC
2024-10-19 22:34:59 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2024-10-19 22:34:59 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
s:\anaconda3\envs\diff\lib\site-packages\torch\nn\utils\weight_norm.py:134: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)
Units Forced Mode:nearest
 [INFO] Extract f0 volume and mask: Use rmvpe, start...
 [INFO] Extract f0 volume and mask: Done. Use time:2.0290944576263428
  0%|                                                                                                       | 0/18 [00:00<?, ?it/s]s:\anaconda3\envs\diff\lib\site-packages\torch\nn\functional.py:5560: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)
2024-10-19 22:35:03 | INFO | faiss.loader | Loading faiss with AVX512 support.
2024-10-19 22:35:03 | INFO | faiss.loader | Could not load library with AVX512 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx512'")
2024-10-19 22:35:03 | INFO | faiss.loader | Loading faiss with AVX2 support.
2024-10-19 22:35:03 | INFO | faiss.loader | Successfully loaded faiss with AVX2 support.
 [INFO]  Successfully load Units Indexer from E:\Diffusion-SVC\exp\test\units_index\spk1.pkl.
CNChTu commented 1 month ago

This might be because the index is too large. Is the memory usage continuously increasing? Anyway, I don't think the indexing feature improves the quality of SVC. It will most likely be removed later.

A-2-H commented 1 month ago

Index was made in different models with different number of files, about 200-400 data files. I didn't notice continously increasing memory usage. Just wanted to try it and see the results. I'm experimenting with different settings and still learning how it all works and what to do for best results.