Closed lokesh12345678910 closed 7 months ago
I was using my SpeechCLIP folder, let me try my SpeechCLIP+ folder
Hi,
Thanks for pointing out some issues, I will try to handle them tomorrow, thanks!
On Tue, Apr 9, 2024 at 12:20 AM Lokesha Pugalenthi @.***> wrote:
(speechCLIP) v330-010.ls6(1009)$ python largePlusImageAudioSim.py ../ART_PPA_WAB_CatRescue/CatRescuePackage.png Cat ../ART_PPA_WAB_CatRescue/SE_PreTx_WAB_CatRescue_WAV/ SE_PreTx_CatRescue Using cache found in /home1/07469/lpugalen/.cache/torch/hub/s3prl_cache/4a54d64fa42b41e39db994c958d8107d5785a100f38c6eba680b6a3cc79babb3 for https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt 2024-04-08 11:12:02 | INFO | fairseq.tasks.hubert_pretraining | current directoryis /work/07469/lpugalen/ls6/SpeechCLIP 2024-04-08 11:12:02 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': '/checkpoint/wnhsu/data/librivox', 'fine_tuning': False, 'labels': ['lyr9.km500'], 'label_dir': '/checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': True, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} 2024-04-08 11:12:02 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': layer_norm, 'encoder_layers': 24, 'encoder_embed_dim': 1024, 'encoder_ffn_embed_dim': 4096, 'encoder_attention_heads': 16, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 768, 'untie_final_proj': True, 'layer_norm_first': True, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 1.0, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': True, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} /work/07469/lpugalen/ls6/SpeechCLIP/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 2024-04-08 11:12:08 | INFO | avssl.module.speech_encoder_plus | Normalize waveform = (True) [W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware. 2024-04-08 11:12:08 | INFO | avssl.module.speech_encoder_plus | Loaded s3prl speech encoder (hubert_large_ll60k): out_dim = 1024 layer_drop = 0.0 2024-04-08 11:12:08 | INFO | avssl.module.speech_encoder_plus | Using weighted sum for all hiddenstates(25) 2024-04-08 11:12:14 | WARNING | avssl.module.clip_official | Reduce text embedding to size of 8112 Traceback (most recent call last): File "/work/07469/lpugalen/ls6/SpeechCLIP/largePlusImageAudioSim.py", line 39, in largePlusFlickrCascasdedModel = avssl.model.KWClip_GeneralTransformer.load_from_checkpoint(largePlusFlickrCascadedModelPath).to(device) File "/work/07469/lpugalen/ls6/SpeechCLIP/pytorch_lightning/core/saving.py", line 156, in load_from_checkpoint model = cls.
load_model_state(checkpoint, strict=strict, kwargs) File "/work/07469/lpugalen/ls6/SpeechCLIP/pytorch_lightning/core/saving.py", line 198, in _load_model_state model = cls(cls_kwargs) File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/model/kwClip.py", line 1122, in__init super().init(config) File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/model/kwClip.py", line 89, in init self.keyword_num = self.config.model_settings.cascaded_branch.keyword.number File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/base/ordered_namespace.py", line 68, in getattr return super(OrderedNamespace, self).getattribute(key) AttributeError: 'OrderedNamespace' object has no attribute 'number' (speechCLIP) v330-010.ls6(1010)$ python largePlusImageAudioSim.py ../ART_PPA_WAB_CatRescue/CatRescuePackage.png Cat ../ART_PPA_WAB_CatRescue/SE_PreTx_WAB_CatRescue_WAV/ SE_PreTx_CatRescue 2024-04-08 11:14:32 | INFO | avssl.module.speech_encoder_plus | Normalize hidden states (s3prl) Using cache found in /home1/07469/lpugalen/.cache/torch/hub/s3prl_cache/4a54d64fa42b41e39db994c958d8107d5785a100f38c6eba680b6a3cc79babb3 for https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt 2024-04-08 11:14:33 | INFO | fairseq.tasks.hubert_pretraining | current directoryis /work/07469/lpugalen/ls6/SpeechCLIP 2024-04-08 11:14:33 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': '/checkpoint/wnhsu/data/librivox', 'fine_tuning': False, 'labels': ['lyr9.km500'], 'label_dir': '/checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': True, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} 2024-04-08 11:14:33 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': layer_norm, 'encoder_layers': 24, 'encoder_embed_dim': 1024, 'encoder_ffn_embed_dim': 4096, 'encoder_attention_heads': 16, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 768, 'untie_final_proj': True, 'layer_norm_first': True, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 1.0, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': True, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} /work/07469/lpugalen/ls6/SpeechCLIP/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 2024-04-08 11:14:38 | INFO | avssl.module.speech_encoder_plus | Normalize waveform = (True) [W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware. 2024-04-08 11:14:39 | INFO | avssl.module.speech_encoder_plus | Loaded s3prl speech encoder (hubert_large_ll60k): out_dim = 1024 layer_drop = 0.0 2024-04-08 11:14:39 | INFO | avssl.module.speech_encoder_plus | Using weighted sum for all hiddenstates(25) 2024-04-08 11:14:39 | INFO | avssl.module.weighted_sum | Normalize feature beforeweighted sum 2024-04-08 11:14:45 | WARNING | avssl.module.clip_official | Reduce text embedding to size of 8112 Traceback (most recent call last): File "/work/07469/lpugalen/ls6/SpeechCLIP/largePlusImageAudioSim.py", line 42, in largePlusFlickrHybridModel = avssl.model.KWClip_GeneralTransformer.load_from_checkpoint(largePlusFlickrHybridModelPath).to(device) File "/work/07469/lpugalen/ls6/SpeechCLIP/pytorch_lightning/core/saving.py", line 156, in load_from_checkpoint model = cls.
load_model_state(checkpoint, strict=strict, kwargs) File "/work/07469/lpugalen/ls6/SpeechCLIP/pytorch_lightning/core/saving.py", line 198, in _load_model_state model = cls(cls_kwargs) File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/model/kwClip.py", line 1122, in__init super().init(config) File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/model/kwClip.py", line 89, in init self.keyword_num = self.config.model_settings.cascaded_branch.keyword.number File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/base/ordered_namespace.py", line 68, in getattr return super(OrderedNamespace, self).getattribute(key) AttributeError: 'OrderedNamespace' object has no attribute 'number'
— Reply to this email directly, view it on GitHub https://github.com/ShampooWang/SpeechCLIP_plus/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATNKBTFOWKZZ4IVCJWGKJD3Y4K7WRAVCNFSM6AAAAABF5AZJ4WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTCNRSGUZTSNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Ok, if there still have issues, I will try to fix them, thanks!
On Tue, Apr 9, 2024 at 12:23 AM Lokesha Pugalenthi @.***> wrote:
I was using my SpeechCLIP folder, let me try my SpeechCLIP+ folder
— Reply to this email directly, view it on GitHub https://github.com/ShampooWang/SpeechCLIP_plus/issues/2#issuecomment-2043173517, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATNKBTDRPAEFC7B4FADUI7DY4K77PAVCNFSM6AAAAABF5AZJ4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTGE3TGNJRG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
(SpeechCLIP+) login2.ls6(1047)$ python largePlusImageAudioSim.py ../ART_PPA_WAB_CatRescue/CatRescuePackage.png Cat ../ART_PPA_WAB_CatRescue/SE_PreTx_WAB_CatRescue_WAV/ SE_PreTx_CatRescue
2024-04-08 11:53:37 | INFO | numexpr.utils | Note: detected 256 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-04-08 11:53:37 | INFO | numexpr.utils | Note: NumExpr detected 256 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Using cache found in /home1/07469/lpugalen/.cache/torch/hub/s3prl_cache/4a54d64fa42b41e39db994c958d8107d5785a100f38c6eba680b6a3cc79babb3
for https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt
Traceback (most recent call last):
File "/work/07469/lpugalen/ls6/SpeechCLIP_plus/largePlusImageAudioSim.py", line 39, in
This was in my SpeechCLIP+ folder. Are the ckpt files valid? I'm pretty sure I fully downloaded them through the bash script
I just run download_ckpts.sh
to download checkpoints and use the below codes to load checkpoints.
from avssl.model import KWClip_GeneralTransformer
largePlusFlickrCascadedModelPath = "/mnt/md1/user_jeffwang/SpeechCLIP-plus/icassp_sasb_ckpts/SpeechCLIP+/large/flickr/cascaded/model.ckpt"
largePlusFlickrCascasdedModel = KWClip_GeneralTransformer.load_from_checkpoint(largePlusFlickrCascadedModelPath).cuda()
Everything looks fine to me, here is the output,
2024-04-09 05:01:20 | WARNING | s3prl.upstream.espnet_hubert.expert | ESPnet is not installed, cannot use espnet_hubert upstream 2024-04-09 05:01:23 | INFO | torch.distributed.nn.jit.instantiator | Created a temporary directory at /tmp/tmpmzder8k9 2024-04-09 05:01:23 | INFO | torch.distributed.nn.jit.instantiator | Writing /tmp/tmpmzder8k9/_remote_module_non_sriptable.py Using cache found in /home/jeffwang/.cache/torch/hub/s3prl_cache/4a54d64fa42b41e39db994c958d8107d5785a100f38c6eba680b6a3cc79babb3 for https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt 2024-04-09 05:01:25 | INFO | fairseq.tasks.hubert_pretraining | current directory is /mnt/md1/user_jeffwang/SpeechCLIP-plus 2024-04-09 05:01:25 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': '/checkpoint/wnhsu/data/librivox', 'fine_tuning': False, 'labels': ['lyr9.km500'], 'label_dir': '/checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': True, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} 2024-04-09 05:01:25 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': layer_norm, 'encoder_layers': 24, 'encoder_embed_dim': 1024, 'encoder_ffn_embed_dim': 4096, 'encoder_attention_heads': 16, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 768, 'untie_final_proj': True, 'layer_norm_first': True, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 1.0, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': True, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} 2024-04-09 05:01:31 | INFO | avssl.module.speech_encoder_plus | Normalize waveform = (True) 2024-04-09 05:01:31 | INFO | avssl.module.speech_encoder_plus | Loaded s3prl speech encoder (hubert_large_ll60k): out_dim = 1024 layer_drop = 0.0 2024-04-09 05:01:31 | INFO | avssl.module.speech_encoder_plus | Using weighted sum for all hiddenstates(25) 2024-04-09 05:01:41 | WARNING | avssl.module.clip_official | Reduce text embedding to size of 8112 2024-04-09 05:01:42 | INFO | avssl.model.kw_branches | Create Cascaded Branch Plus 2024-04-09 05:01:42 | INFO | avssl.model.kw_branches | Using KW_CascadedBranchPlus 2024-04-09 05:01:42 | INFO | avssl.model.kw_branches | Using self-attention before downsampling 2024-04-09 05:01:42 | INFO | avssl.model.kw_branches | Using MultiheadAttentionAndNorm as KW_CascadedBranchPlus 2024-04-09 05:01:42 | INFO | avssl.model.kw_branches | kw_projection dims:[1024, 1024, 768] droupout:0.1 2024-04-09 05:01:42 | INFO | avssl.module.speechclip_c_modules.my_vector_quantizer | Setting vq temp fixed=0.1 2024-04-09 05:01:42 | INFO | avssl.module.speechclip_c_modules.kw_bn | Initialize BatchNorm weight and bias learnable=(True) with token embeddings w/ scale=1.0 2024-04-09 05:01:42 | INFO | avssl.module.cif | Apply scaling strategy step: 5000 2024-04-09 05:01:42 | INFO | avssl.model.kw_branches | Using cif downsampling method
Maybe something goes wrong with the environment? For the version of torch, I am using 1.11.0+cu113
.
Yes, this was an installation error on my end, it worked when I tried setting up the environment again from scratch.
(speechCLIP) v330-010.ls6(1009)$ python largePlusImageAudioSim.py ../ART_PPA_WAB_CatRescue/CatRescuePackage.png Cat ../ART_PPA_WAB_CatRescue/SE_PreTx_WAB_CatRescue_WAV/ SE_PreTx_CatRescue Using cache found in /home1/07469/lpugalen/.cache/torch/hub/s3prl_cache/4a54d64fa42b41e39db994c958d8107d5785a100f38c6eba680b6a3cc79babb3 for https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt 2024-04-08 11:12:02 | INFO | fairseq.tasks.hubert_pretraining | current directoryis /work/07469/lpugalen/ls6/SpeechCLIP 2024-04-08 11:12:02 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': '/checkpoint/wnhsu/data/librivox', 'fine_tuning': False, 'labels': ['lyr9.km500'], 'label_dir': '/checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': True, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} 2024-04-08 11:12:02 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': layer_norm, 'encoder_layers': 24, 'encoder_embed_dim': 1024, 'encoder_ffn_embed_dim': 4096, 'encoder_attention_heads': 16, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 768, 'untie_final_proj': True, 'layer_norm_first': True, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 1.0, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': True, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} /work/07469/lpugalen/ls6/SpeechCLIP/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 2024-04-08 11:12:08 | INFO | avssl.module.speech_encoder_plus | Normalize waveform = (True) [W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware. 2024-04-08 11:12:08 | INFO | avssl.module.speech_encoder_plus | Loaded s3prl speech encoder (hubert_large_ll60k): out_dim = 1024 layer_drop = 0.0 2024-04-08 11:12:08 | INFO | avssl.module.speech_encoder_plus | Using weighted sum for all hiddenstates(25) 2024-04-08 11:12:14 | WARNING | avssl.module.clip_official | Reduce text embedding to size of 8112 Traceback (most recent call last): File "/work/07469/lpugalen/ls6/SpeechCLIP/largePlusImageAudioSim.py", line 39, in
largePlusFlickrCascasdedModel = avssl.model.KWClip_GeneralTransformer.load_from_checkpoint(largePlusFlickrCascadedModelPath).to(device)
File "/work/07469/lpugalen/ls6/SpeechCLIP/pytorch_lightning/core/saving.py", line 156, in load_from_checkpoint
model = cls._load_model_state(checkpoint, strict=strict, kwargs)
File "/work/07469/lpugalen/ls6/SpeechCLIP/pytorch_lightning/core/saving.py", line 198, in _load_model_state
model = cls(_cls_kwargs)
File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/model/kwClip.py", line 1122, ininit
super().init(config)
File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/model/kwClip.py", line 89, in init
self.keyword_num = self.config.model_settings.cascaded_branch.keyword.number
File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/base/ordered_namespace.py", line 68, in getattr
return super(OrderedNamespace, self).getattribute(key)
AttributeError: 'OrderedNamespace' object has no attribute 'number'
(speechCLIP) v330-010.ls6(1010)$ python largePlusImageAudioSim.py ../ART_PPA_WAB_CatRescue/CatRescuePackage.png Cat ../ART_PPA_WAB_CatRescue/SE_PreTx_WAB_CatRescue_WAV/ SE_PreTx_CatRescue
2024-04-08 11:14:32 | INFO | avssl.module.speech_encoder_plus | Normalize hidden states (s3prl)
Using cache found in /home1/07469/lpugalen/.cache/torch/hub/s3prl_cache/4a54d64fa42b41e39db994c958d8107d5785a100f38c6eba680b6a3cc79babb3
for https://dl.fbaipublicfiles.com/hubert/hubert_large_ll60k.pt
2024-04-08 11:14:33 | INFO | fairseq.tasks.hubert_pretraining | current directoryis /work/07469/lpugalen/ls6/SpeechCLIP
2024-04-08 11:14:33 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': '/checkpoint/wnhsu/data/librivox', 'fine_tuning': False, 'labels': ['lyr9.km500'], 'label_dir': '/checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': True, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2024-04-08 11:14:33 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': layer_norm, 'encoder_layers': 24, 'encoder_embed_dim': 1024, 'encoder_ffn_embed_dim': 4096, 'encoder_attention_heads': 16, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.0, 'dropout_input': 0.0, 'dropout_features': 0.0, 'final_dim': 768, 'untie_final_proj': True, 'layer_norm_first': True, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 1.0, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': True, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
/work/07469/lpugalen/ls6/SpeechCLIP/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
2024-04-08 11:14:38 | INFO | avssl.module.speech_encoder_plus | Normalize waveform = (True)
[W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware.
2024-04-08 11:14:39 | INFO | avssl.module.speech_encoder_plus | Loaded s3prl speech encoder (hubert_large_ll60k): out_dim = 1024 layer_drop = 0.0
2024-04-08 11:14:39 | INFO | avssl.module.speech_encoder_plus | Using weighted sum for all hiddenstates(25)
2024-04-08 11:14:39 | INFO | avssl.module.weighted_sum | Normalize feature beforeweighted sum
2024-04-08 11:14:45 | WARNING | avssl.module.clip_official | Reduce text embedding to size of 8112
Traceback (most recent call last):
File "/work/07469/lpugalen/ls6/SpeechCLIP/largePlusImageAudioSim.py", line 42, in
largePlusFlickrHybridModel = avssl.model.KWClip_GeneralTransformer.load_from_checkpoint(largePlusFlickrHybridModelPath).to(device)
File "/work/07469/lpugalen/ls6/SpeechCLIP/pytorch_lightning/core/saving.py", line 156, in load_from_checkpoint
model = cls._load_model_state(checkpoint, strict=strict, kwargs)
File "/work/07469/lpugalen/ls6/SpeechCLIP/pytorch_lightning/core/saving.py", line 198, in _load_model_state
model = cls(_cls_kwargs)
File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/model/kwClip.py", line 1122, ininit
super().init(config)
File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/model/kwClip.py", line 89, in init
self.keyword_num = self.config.model_settings.cascaded_branch.keyword.number
File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/base/ordered_namespace.py", line 68, in getattr
return super(OrderedNamespace, self).getattribute(key)
AttributeError: 'OrderedNamespace' object has no attribute 'number'