error when running training pgt command

vcstack commented 2 months ago

I followed your instructions and did training on google colab until running the command as follows and an error occurred: %env CUDA_VISIBLE_DEVICES=0 !python train_gpt_xtts.py \ --output_path=checkpoints/ \ --train_csv_path=datasets/metadata_train.csv \ --eval_csv_path=datasets/metadata_eval.csv \ --language="vi" \ --num_epochs=5 \ --batch_size=8 \ --grad_acumm=2 \ --max_text_length=250 \ --max_audio_length=255995 \ --weight_decay=1e-2 \ --lr=5e-6 \ --save_step=2000

this is the error i am getting: env: CUDA_VISIBLE_DEVICES=0 RuntimeError: module was compiled against NumPy C-API version 0x10 (NumPy 1.23) but the running NumPy has C-API version 0xf. Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem. /content/XTTSv2-Finetuning-for-New-Languages/TTS/utils/io.py:54: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. return torch.load(f, map_location=map_location, **kwargs)

Loading checkpoint with 1246 additional tokens. /content/XTTSv2-Finetuning-for-New-Languages/TTS/tts/layers/tortoise/arch_utils.py:336: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. self.mel_norms = torch.load(f) /content/XTTSv2-Finetuning-for-New-Languages/TTS/tts/layers/xtts/trainer/gpt_trainer.py:186: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. dvae_checkpoint = torch.load(self.args.dvae_checkpoint, map_location=torch.device("cpu"))

DVAE weights restored from: checkpoints/XTTS_v2.0_original_model_files/dvae.pth | > [!] 92 files not found | > Found 100 files in /content/XTTSv2-Finetuning-for-New-Languages/datasets | > [!] 92 files not found Training Environment: | > Backend: Torch | > Mixed precision: False | > Precision: float32 | > Current device: 0 | > Num. of GPUs: 1 | > Num. of CPUs: 2 | > Num. of Torch Threads: 1 | > Torch seed: 54321 | > Torch CUDNN: True | > Torch CUDNN deterministic: False | > Torch CUDNN benchmark: False | > Torch TF32 MatMul: False 2024-09-26 08:10:58.577010: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-26 08:10:58.608360: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-26 08:10:58.617715: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-26 08:10:58.639454: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py", line 42, in tf from tensorboard.compat import notf # noqa: F401 ImportError: cannot import name 'notf' from 'tensorboard.compat' (/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py)

During handling of the above exception, another exception occurred:

RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py", line 42, in tf from tensorboard.compat import notf # noqa: F401 ImportError: cannot import name 'notf' from 'tensorboard.compat' (/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py)

During handling of the above exception, another exception occurred:

RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py", line 42, in tf from tensorboard.compat import notf # noqa: F401 ImportError: cannot import name 'notf' from 'tensorboard.compat' (/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py)

During handling of the above exception, another exception occurred:

RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py", line 42, in tf from tensorboard.compat import notf # noqa: F401 ImportError: cannot import name 'notf' from 'tensorboard.compat' (/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/content/XTTSv2-Finetuning-for-New-Languages/train_gpt_xtts.py", line 253, in trainer_out_path = train_gpt( File "/content/XTTSv2-Finetuning-for-New-Languages/train_gpt_xtts.py", line 220, in train_gpt trainer = Trainer( File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 440, in init self.dashboard_logger, self.c_logger = self.init_loggers(self.config, output_path, dashboard_logger, c_logger) File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 706, in init_loggers dashboard_logger = logger_factory(config, output_path) File "/usr/local/lib/python3.10/dist-packages/trainer/logging/init.py", line 31, in logger_factory from trainer.logging.tensorboard_logger import TensorboardLogger File "/usr/local/lib/python3.10/dist-packages/trainer/logging/tensorboard_logger.py", line 3, in from torch.utils.tensorboard import SummaryWriter File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/init.py", line 12, in from .writer import FileWriter, SummaryWriter # noqa: F401 File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/writer.py", line 19, in from ._embedding import get_embedding_info, make_mat, make_sprite, make_tsv, write_pbtxt File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/_embedding.py", line 10, in _HAS_GFILE_JOIN = hasattr(tf.io.gfile, "join") File "/usr/local/lib/python3.10/dist-packages/tensorboard/lazy.py", line 65, in getattr return getattr(load_once(self), attr_name) File "/usr/local/lib/python3.10/dist-packages/tensorboard/lazy.py", line 97, in wrapper cache[arg] = f(arg) File "/usr/local/lib/python3.10/dist-packages/tensorboard/lazy.py", line 50, in load_once module = load_fn() File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py", line 45, in tf import tensorflow File "/usr/local/lib/python3.10/dist-packages/tensorflow/init.py", line 47, in from tensorflow._api.v2 import internal File "/usr/local/lib/python3.10/dist-packages/tensorflow/_api/v2/internal/init.py", line 11, in from tensorflow._api.v2.internal import distribute File "/usr/local/lib/python3.10/dist-packages/tensorflow/_api/v2/internal/distribute/init.py", line 8, in from tensorflow._api.v2.internal.distribute import combinations File "/usr/local/lib/python3.10/dist-packages/tensorflow/_api/v2/internal/distribute/combinations/init.py", line 8, in from tensorflow.python.distribute.combinations import env # line: 456 File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/distribute/combinations.py", line 33, in from tensorflow.python.distribute import collective_all_reduce_strategy File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/distribute/collective_all_reduce_strategy.py", line 25, in from tensorflow.python.distribute import cross_device_ops as cross_device_ops_lib File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/distribute/cross_device_ops.py", line 28, in from tensorflow.python.distribute import cross_device_utils File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/distribute/cross_device_utils.py", line 22, in from tensorflow.python.distribute import values as value_lib File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/distribute/values.py", line 23, in from tensorflow.python.distribute import distribute_lib File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 205, in from tensorflow.python.data.ops import dataset_ops File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/init.py", line 21, in from tensorflow.python.data import experimental File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/experimental/init.py", line 98, in from tensorflow.python.data.experimental import service File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/experimental/service/init.py", line 419, in from tensorflow.python.data.experimental.ops.data_service_ops import distribute File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/experimental/ops/data_service_ops.py", line 26, in from tensorflow.python.data.ops import dataset_ops File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 34, in from tensorflow.python.data.ops import iterator_ops File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 45, in from tensorflow.python.training.saver import BaseSaverBuilder File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/saver.py", line 50, in from tensorflow.python.training import py_checkpoint_reader File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 19, in from tensorflow.python.util._pywrap_checkpoint_reader import CheckpointReader SystemError: initialization of _pywrap_checkpoint_reader raised unreported exception

hope you can help me solve this error

anhnh2002 commented 1 month ago

It seems the error comes from a dependency issue. Have you tried running the following command? pip install -r requirements.txt

vcstack commented 1 month ago

@nguyenhoanganh2002 Hi bạn, Mình đã chạy được rồi, nhưng cho mình hỏi thêm sau khi mình chạy xong lệnh train gpt thì có tạo ra thư mục checkpoints/GPT_XTTS_FT-September-30-2024_03+49AM-861be1fm, nhưng trong đây lại không thấy có file best_model_99875.pth (ko thấy có file nào .pth trong thư mục mới gen ra luôn) Mình đang muốn chạy theo code 7. Usage Example của bạn nhưng không thể chạy dc vì ko thấy có file best_model

anhnh2002 commented 1 month ago

@nguyenhoanganh2002 Hi bạn, Mình đã chạy được rồi, nhưng cho mình hỏi thêm sau khi mình chạy xong lệnh train gpt thì có tạo ra thư mục checkpoints/GPT_XTTS_FT-September-30-2024_03+49AM-861be1fm, nhưng trong đây lại không thấy có file best_model_99875.pth (ko thấy có file nào .pth trong thư mục mới gen ra luôn) Mình đang muốn chạy theo code 7. Usage Example của bạn nhưng không thể chạy dc vì ko thấy có file best_model

Bạn có thể gửi mình log chi tiết không? Khả năng cao là dữ liệu bạn ít, chạy không đủ số step nên nó chưa lưu checkpoint lại. Để giải quyết thì bạn có thể giảm tham số --save_step=2000 xuống cho phù hợp với lượng dữ liệu mà bạn có.

vcstack commented 1 month ago

@nguyenhoanganh2002 Thank bạn, mình sử dụng giọng nói của mình để train chỉ có 100 file audio, mình đã chỉnh lại tham số save_step và đã chạy được rồi Bạn cho mình hỏi thêm ở đoạn code example của bạn có sử dụng speaker_audio_file = "ref.wav" vậy phải cần có file này thì mới thực hiện TTS được hả, do trước đó mình có dùng audio giọng nói của mình để train rồi, thì ở chỗ này bỏ qua được không? Hoặc ở trong file: metadata_train.csv đã có đánh giấu speaker_name rồi audio_file|text|speaker_name wavs/xxx.wav|How do you do?|@X wavs/yyy.wav|Nice to meet you.|@Y wavs/zzz.wav|Good to see you.|@Z Vậy thay vì phải tạo 1 file speaker_audio_file = "ref.wav" thì mình có cách nào để truyền speaker_name (@X, @Y) vào để thực hiện TTS hay không?

anhnh2002 commented 1 month ago

@nguyenhoanganh2002 Thank bạn, mình sử dụng giọng nói của mình để train chỉ có 100 file audio, mình đã chỉnh lại tham số save_step và đã chạy được rồi Bạn cho mình hỏi thêm ở đoạn code example của bạn có sử dụng speaker_audio_file = "ref.wav" vậy phải cần có file này thì mới thực hiện TTS được hả, do trước đó mình có dùng audio giọng nói của mình để train rồi, thì ở chỗ này bỏ qua được không? Hoặc ở trong file: metadata_train.csv đã có đánh giấu speaker_name rồi audio_file|text|speaker_name wavs/xxx.wav|How do you do?|@x wavs/yyy.wav|Nice to meet you.|@y wavs/zzz.wav|Good to see you.|@z Vậy thay vì phải tạo 1 file speaker_audio_file = "ref.wav" thì mình có cách nào để truyền speaker_name (@x, @y) vào để thực hiện TTS hay không?

trường speaker_name này ko dùng cho mô hình XTTS bạn ạ. Trường hợp của bạn thì bạn lấy 1 trong những file audio mà bạn train để làm ref.wav

anhnh2002 / XTTSv2-Finetuning-for-New-Languages

error when running training pgt command #12