python audio-chatgpt.py
Initializing AudioGPT
Initializing Make-An-Audio to cpu
LatentDiffusion_audio: Running in eps-prediction mode
DiffusionWrapper has 160.22 M params.
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 106, 106) = 44944 dimensions.
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']
This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
TextEncoder comes with 111.32 M params.
Traceback (most recent call last):
File "audio-chatgpt.py", line 1378, in
bot = ConversationBot()
File "audio-chatgpt.py", line 1057, in init
self.t2a = T2A(device="cpu")
File "audio-chatgpt.py", line 144, in init
self.sampler = self._initialize_model('text_to_audio/Make_An_Audio/configs/text_to_audio/txt2audio_args.yaml', 'text_to_audio/Make_An_Audio/useful_ckpts/ta40multi_epoch=000085.ckpt', device=device)
File "audio-chatgpt.py", line 150, in _initialize_model
model.load_state_dict(torch.load(ckpt, map_location='cpu')["state_dict"], strict=False)
File "/root/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args)
File "/root/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.
python audio-chatgpt.py Initializing AudioGPT Initializing Make-An-Audio to cpu LatentDiffusion_audio: Running in eps-prediction mode DiffusionWrapper has 160.22 M params. making attention of type 'vanilla' with 256 in_channels making attention of type 'vanilla' with 256 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 106, 106) = 44944 dimensions. making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 256 in_channels making attention of type 'vanilla' with 256 in_channels making attention of type 'vanilla' with 256 in_channels Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']
This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). TextEncoder comes with 111.32 M params. Traceback (most recent call last): File "audio-chatgpt.py", line 1378, in bot = ConversationBot() File "audio-chatgpt.py", line 1057, in init self.t2a = T2A(device="cpu") File "audio-chatgpt.py", line 144, in init self.sampler = self._initialize_model('text_to_audio/Make_An_Audio/configs/text_to_audio/txt2audio_args.yaml', 'text_to_audio/Make_An_Audio/useful_ckpts/ta40multi_epoch=000085.ckpt', device=device) File "audio-chatgpt.py", line 150, in _initialize_model model.load_state_dict(torch.load(ckpt, map_location='cpu')["state_dict"], strict=False) File "/root/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/root/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/serialization.py", line 920, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'.