WHU-USI3DV / VistaDream

[arXiv'24] VistaDream: Sampling multiview consistent images for single-view scene reconstruction
https://vistadream-project-page.github.io/
MIT License
331 stars 13 forks source link

any colab ? #16

Open kilik128 opened 1 week ago

kilik128 commented 1 week ago

hey anyone as colab install ? thank's

kilik128 commented 5 days ago

got this log == = 2024-11-16 23:46:26.851310: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-16 23:46:27.087792: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-16 23:46:27.149776: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-16 23:46:27.525214: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-11-16 23:46:29.620736: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Load default preset failed. [Errno 2] No such file or directory: '/content/VistaDream/presets/default.json' No presets found. No presets found. Total VRAM 15102 MB, total RAM 12979 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 Tesla T4 : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'} left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids']) Base model loaded: /content/VistaDream/tools/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors VAE loaded: None Request to load LoRAs [] for model [/content/VistaDream/tools/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors]. Fooocus V2 Expansion: Vocab with 642 words. /usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 0.60 seconds Started worker with PID 38384 Loading config tools/OneFormer/configs/ade20k/dinat/coco_pretrain_oneformer_dinat_large_bs16_160k_1280x1280.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content. Loading config tools/OneFormer/configs/ade20k/dinat/../Base-ADE20K-UnifiedSegmentation.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content. The checkpoint state_dict contains keys that are not used by the model: text_encoder.positional_embedding text_encoder.transformer.resblocks.0.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.0.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.0.ln_1.{bias, weight} text_encoder.transformer.resblocks.0.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.0.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.0.ln_2.{bias, weight} text_encoder.transformer.resblocks.1.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.1.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.1.ln_1.{bias, weight} text_encoder.transformer.resblocks.1.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.1.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.1.ln_2.{bias, weight} text_encoder.transformer.resblocks.2.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.2.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.2.ln_1.{bias, weight} text_encoder.transformer.resblocks.2.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.2.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.2.ln_2.{bias, weight} text_encoder.transformer.resblocks.3.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.3.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.3.ln_1.{bias, weight} text_encoder.transformer.resblocks.3.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.3.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.3.ln_2.{bias, weight} text_encoder.transformer.resblocks.4.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.4.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.4.ln_1.{bias, weight} text_encoder.transformer.resblocks.4.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.4.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.4.ln_2.{bias, weight} text_encoder.transformer.resblocks.5.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.5.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.5.ln_1.{bias, weight} text_encoder.transformer.resblocks.5.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.5.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.5.ln_2.{bias, weight} text_encoder.ln_final.{bias, weight} text_encoder.token_embedding.weight text_projector.layers.0.{bias, weight} text_projector.layers.1.{bias, weight} prompt_ctx.weight backbone.levels.0.blocks.0.attn.rpb backbone.levels.0.blocks.1.attn.rpb backbone.levels.0.blocks.2.attn.rpb backbone.levels.1.blocks.0.attn.rpb backbone.levels.1.blocks.1.attn.rpb backbone.levels.1.blocks.2.attn.rpb backbone.levels.1.blocks.3.attn.rpb backbone.levels.2.blocks.0.attn.rpb backbone.levels.2.blocks.1.attn.rpb backbone.levels.2.blocks.2.attn.rpb backbone.levels.2.blocks.3.attn.rpb backbone.levels.2.blocks.4.attn.rpb backbone.levels.2.blocks.5.attn.rpb backbone.levels.2.blocks.6.attn.rpb backbone.levels.2.blocks.7.attn.rpb backbone.levels.2.blocks.8.attn.rpb backbone.levels.2.blocks.9.attn.rpb backbone.levels.2.blocks.10.attn.rpb backbone.levels.2.blocks.11.attn.rpb backbone.levels.2.blocks.12.attn.rpb backbone.levels.2.blocks.13.attn.rpb backbone.levels.2.blocks.14.attn.rpb backbone.levels.2.blocks.15.attn.rpb backbone.levels.2.blocks.16.attn.rpb backbone.levels.2.blocks.17.attn.rpb backbone.levels.3.blocks.0.attn.rpb backbone.levels.3.blocks.1.attn.rpb backbone.levels.3.blocks.2.attn.rpb backbone.levels.3.blocks.3.attn.rpb backbone.levels.3.blocks.4.attn.rpb /usr/local/lib/python3.10/dist-packages/transformers/models/llava/configuration_llava.py:100: FutureWarning: The vocab_size argument is deprecated and will be removed in v4.42, since it can be inferred from the text_config. Passing this argument has no effect warnings.warn( Loading checkpoint shards: 100% 4/4 [00:07<00:00, 1.94s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

kilik128 commented 3 days ago

150try later got /usr/local/lib/python3.10/dist-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning) 2024-11-19 19:22:36.151188: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-19 19:22:36.377230: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-19 19:22:36.437989: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-19 19:22:36.817334: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-11-19 19:22:39.117386: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Load default preset failed. [Errno 2] No such file or directory: '/content/VistaDream/presets/default.json' No presets found. No presets found. Total VRAM 15102 MB, total RAM 12979 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 Tesla T4 : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'} left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids']) Base model loaded: /content/VistaDream/tools/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors VAE loaded: None Request to load LoRAs [] for model [/content/VistaDream/tools/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors]. Fooocus V2 Expansion: Vocab with 642 words. /usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 0.69 seconds Started worker with PID 12058 Loading config tools/OneFormer/configs/ade20k/dinat/coco_pretrain_oneformer_dinat_large_bs16_160k_1280x1280.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content. Loading config tools/OneFormer/configs/ade20k/dinat/../Base-ADE20K-UnifiedSegmentation.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content. The checkpoint state_dict contains keys that are not used by the model: text_encoder.positional_embedding text_encoder.transformer.resblocks.0.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.0.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.0.ln_1.{bias, weight} text_encoder.transformer.resblocks.0.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.0.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.0.ln_2.{bias, weight} text_encoder.transformer.resblocks.1.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.1.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.1.ln_1.{bias, weight} text_encoder.transformer.resblocks.1.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.1.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.1.ln_2.{bias, weight} text_encoder.transformer.resblocks.2.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.2.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.2.ln_1.{bias, weight} text_encoder.transformer.resblocks.2.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.2.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.2.ln_2.{bias, weight} text_encoder.transformer.resblocks.3.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.3.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.3.ln_1.{bias, weight} text_encoder.transformer.resblocks.3.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.3.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.3.ln_2.{bias, weight} text_encoder.transformer.resblocks.4.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.4.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.4.ln_1.{bias, weight} text_encoder.transformer.resblocks.4.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.4.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.4.ln_2.{bias, weight} text_encoder.transformer.resblocks.5.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.5.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.5.ln_1.{bias, weight} text_encoder.transformer.resblocks.5.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.5.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.5.ln_2.{bias, weight} text_encoder.ln_final.{bias, weight} text_encoder.token_embedding.weight text_projector.layers.0.{bias, weight} text_projector.layers.1.{bias, weight} prompt_ctx.weight backbone.levels.0.blocks.0.attn.rpb backbone.levels.0.blocks.1.attn.rpb backbone.levels.0.blocks.2.attn.rpb backbone.levels.1.blocks.0.attn.rpb backbone.levels.1.blocks.1.attn.rpb backbone.levels.1.blocks.2.attn.rpb backbone.levels.1.blocks.3.attn.rpb backbone.levels.2.blocks.0.attn.rpb backbone.levels.2.blocks.1.attn.rpb backbone.levels.2.blocks.2.attn.rpb backbone.levels.2.blocks.3.attn.rpb backbone.levels.2.blocks.4.attn.rpb backbone.levels.2.blocks.5.attn.rpb backbone.levels.2.blocks.6.attn.rpb backbone.levels.2.blocks.7.attn.rpb backbone.levels.2.blocks.8.attn.rpb backbone.levels.2.blocks.9.attn.rpb backbone.levels.2.blocks.10.attn.rpb backbone.levels.2.blocks.11.attn.rpb backbone.levels.2.blocks.12.attn.rpb backbone.levels.2.blocks.13.attn.rpb backbone.levels.2.blocks.14.attn.rpb backbone.levels.2.blocks.15.attn.rpb backbone.levels.2.blocks.16.attn.rpb backbone.levels.2.blocks.17.attn.rpb backbone.levels.3.blocks.0.attn.rpb backbone.levels.3.blocks.1.attn.rpb backbone.levels.3.blocks.2.attn.rpb backbone.levels.3.blocks.3.attn.rpb backbone.levels.3.blocks.4.attn.rpb /usr/local/lib/python3.10/dist-packages/transformers/models/llava/configuration_llava.py:100: FutureWarning: The vocab_size argument is deprecated and will be removed in v4.42, since it can be inferred from the text_config. Passing this argument has no effect warnings.warn( Loading checkpoint shards: 100% 4/4 [00:08<00:00, 2.22s/it] Traceback (most recent call last): File "/content/VistaDream/demo.py", line 6, in vistadream = Pipeline(cfg) File "/content/VistaDream/pipe/c2f_recons.py", line 29, in init self.rgb_inpaintor = Inpaint_Tool(cfg) File "/content/VistaDream/pipe/lvm_inpaint.py", line 14, in init self._load_model() File "/content/VistaDream/pipe/lvm_inpaint.py", line 18, in _load_model self.llava = Llava(device='cpu',llava_ckpt=self.cfg.model.vlm.llava.ckpt) File "/content/VistaDream/ops/llava.py", line 16, in init self.processor = AutoProcessor.from_pretrained(self.model_id) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/processing_auto.py", line 319, in from_pretrained return processor_class.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 836, in from_pretrained args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 882, in _get_arguments_from_pretrained args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, kwargs)) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 889, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2163, in from_pretrained return cls._from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2397, in _from_pretrained tokenizer = cls(init_inputs, **init_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in init super().init( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 115, in init fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) Exception: data did not match any variant of untagged enum ModelWrapper at line 268080 column 3

GeoVectorMatrix commented 2 days ago

150try later got /usr/local/lib/python3.10/dist-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning) 2024-11-19 19:22:36.151188: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-19 19:22:36.377230: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-19 19:22:36.437989: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-19 19:22:36.817334: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-11-19 19:22:39.117386: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Load default preset failed. [Errno 2] No such file or directory: '/content/VistaDream/presets/default.json' No presets found. No presets found. Total VRAM 15102 MB, total RAM 12979 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 Tesla T4 : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'} left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids']) Base model loaded: /content/VistaDream/tools/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors VAE loaded: None Request to load LoRAs [] for model [/content/VistaDream/tools/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors]. Fooocus V2 Expansion: Vocab with 642 words. /usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 0.69 seconds Started worker with PID 12058 Loading config tools/OneFormer/configs/ade20k/dinat/coco_pretrain_oneformer_dinat_large_bs16_160k_1280x1280.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content. Loading config tools/OneFormer/configs/ade20k/dinat/../Base-ADE20K-UnifiedSegmentation.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content. The checkpoint state_dict contains keys that are not used by the model: text_encoder.positional_embedding text_encoder.transformer.resblocks.0.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.0.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.0.ln_1.{bias, weight} text_encoder.transformer.resblocks.0.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.0.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.0.ln_2.{bias, weight} text_encoder.transformer.resblocks.1.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.1.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.1.ln_1.{bias, weight} text_encoder.transformer.resblocks.1.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.1.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.1.ln_2.{bias, weight} text_encoder.transformer.resblocks.2.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.2.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.2.ln_1.{bias, weight} text_encoder.transformer.resblocks.2.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.2.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.2.ln_2.{bias, weight} text_encoder.transformer.resblocks.3.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.3.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.3.ln_1.{bias, weight} text_encoder.transformer.resblocks.3.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.3.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.3.ln_2.{bias, weight} text_encoder.transformer.resblocks.4.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.4.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.4.ln_1.{bias, weight} text_encoder.transformer.resblocks.4.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.4.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.4.ln_2.{bias, weight} text_encoder.transformer.resblocks.5.attn.{in_proj_bias, in_proj_weight} text_encoder.transformer.resblocks.5.attn.out_proj.{bias, weight} text_encoder.transformer.resblocks.5.ln_1.{bias, weight} text_encoder.transformer.resblocks.5.mlp.c_fc.{bias, weight} text_encoder.transformer.resblocks.5.mlp.c_proj.{bias, weight} text_encoder.transformer.resblocks.5.ln_2.{bias, weight} text_encoder.ln_final.{bias, weight} text_encoder.token_embedding.weight text_projector.layers.0.{bias, weight} text_projector.layers.1.{bias, weight} prompt_ctx.weight backbone.levels.0.blocks.0.attn.rpb backbone.levels.0.blocks.1.attn.rpb backbone.levels.0.blocks.2.attn.rpb backbone.levels.1.blocks.0.attn.rpb backbone.levels.1.blocks.1.attn.rpb backbone.levels.1.blocks.2.attn.rpb backbone.levels.1.blocks.3.attn.rpb backbone.levels.2.blocks.0.attn.rpb backbone.levels.2.blocks.1.attn.rpb backbone.levels.2.blocks.2.attn.rpb backbone.levels.2.blocks.3.attn.rpb backbone.levels.2.blocks.4.attn.rpb backbone.levels.2.blocks.5.attn.rpb backbone.levels.2.blocks.6.attn.rpb backbone.levels.2.blocks.7.attn.rpb backbone.levels.2.blocks.8.attn.rpb backbone.levels.2.blocks.9.attn.rpb backbone.levels.2.blocks.10.attn.rpb backbone.levels.2.blocks.11.attn.rpb backbone.levels.2.blocks.12.attn.rpb backbone.levels.2.blocks.13.attn.rpb backbone.levels.2.blocks.14.attn.rpb backbone.levels.2.blocks.15.attn.rpb backbone.levels.2.blocks.16.attn.rpb backbone.levels.2.blocks.17.attn.rpb backbone.levels.3.blocks.0.attn.rpb backbone.levels.3.blocks.1.attn.rpb backbone.levels.3.blocks.2.attn.rpb backbone.levels.3.blocks.3.attn.rpb backbone.levels.3.blocks.4.attn.rpb /usr/local/lib/python3.10/dist-packages/transformers/models/llava/configuration_llava.py:100: FutureWarning: The vocab_size argument is deprecated and will be removed in v4.42, since it can be inferred from the text_config. Passing this argument has no effect warnings.warn( Loading checkpoint shards: 100% 4/4 [00:08<00:00, 2.22s/it] Traceback (most recent call last): File "/content/VistaDream/demo.py", line 6, in vistadream = Pipeline(cfg) File "/content/VistaDream/pipe/c2f_recons.py", line 29, in init self.rgb_inpaintor = Inpaint_Tool(cfg) File "/content/VistaDream/pipe/lvm_inpaint.py", line 14, in init self._load_model() File "/content/VistaDream/pipe/lvm_inpaint.py", line 18, in _load_model self.llava = Llava(device='cpu',llava_ckpt=self.cfg.model.vlm.llava.ckpt) File "/content/VistaDream/ops/llava.py", line 16, in init self.processor = AutoProcessor.from_pretrained(self.model_id) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/processing_auto.py", line 319, in from_pretrained return processor_class.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 836, in from_pretrained args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 882, in _get_arguments_from_pretrained args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, kwargs)) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 889, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2163, in from_pretrained return cls._from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2397, in _from_pretrained tokenizer = cls(init_inputs, init_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in init super().init( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 115, in init** fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) Exception: data did not match any variant of untagged enum ModelWrapper at line 268080 column 3

maybe, update transformers will help. pip install -U transformers