ali-vilab / VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
https://i2vgen-xl.github.io
2.88k stars 254 forks source link

dependency issues with dreamvideo #143

Open sburlce opened 1 month ago

sburlce commented 1 month ago

!pip install --upgrade pip setuptools wheel

Update the package list

!apt-get update

Install build tools and other dependencies

!apt-get install -y build-essential cmake pkg-config !apt-get install -y libjpeg-dev libtiff-dev libpng-dev !pip install scikit-build !wget https://files.pythonhosted.org/packages/source/o/opencv-python/opencv-python-4.4.0.46.tar.gz !tar -xzf opencv-python-4.4.0.46.tar.gz %cd opencv-python-4.4.0.46 !python setup.py install %cd ..

Clone the specific version with submodules

!git clone --branch v0.0.13 --recurse-submodules https://github.com/facebookresearch/xformers.git

Navigate into the directory

%cd xformers

Install the package

!pip install . %cd ..

claimed requirements.txt packages

!pip install diffusers !pip install easydict==1.10 !pip install tokenizers==0.12.1 !pip install numpy>=1.19.2 !pip install ftfy==6.1.1 !pip install transformers==4.18.0 !pip install imageio==2.15.0 !pip install fairscale==0.4.6 !pip install ipdb !pip install open-clip-torch==2.0.2

!pip install xformers==0.0.13

!pip install chardet==5.1.0 !pip install torchdiffeq==0.2.3

!pip install opencv-python==4.4.0.46

!pip install opencv-python-headless==4.7.0.68 !pip install torchsde==0.2.6 !pip install simplejson==3.18.4 !pip install motion-vector-extractor==1.0.6 !pip install scikit-learn !pip install scikit-image !pip install rotary-embedding-torch==0.2.1 !pip install pynvml==11.5.0 !pip install triton==2.0.0.dev20221120 !pip install pytorch-lightning==1.4.2 !pip install torchmetrics==0.6.0 !pip install gradio==3.39.0 !pip install imageio-ffmpeg !pip install piq

i had issues with torchtext needing legacy

!pip install torch==1.13.0 !pip install torchvision==0.14.0 !pip install torchaudio==0.13.0 !pip install torchdata==0.5.1 !pip install torchtext==0.14.0

code that snagged

!python inference.py --cfg configs/i2vgen_xl_infer.yaml

error resulting

usr/local/lib/python3.10/dist-packages/xformers/_C.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv WARNING:root:WARNING: /usr/local/lib/python3.10/dist-packages/xformers/_C.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv Need to compile C++ extensions to get sparse attention suport. Please run python setup.py build develop [2024-08-05 04:34:39,675] INFO: {'name': 'Config: VideoLDM Decoder', 'mean': [0.5, 0.5, 0.5], 'std': [0.5, 0.5, 0.5], 'max_words': 1000, 'num_workers': 6, 'prefetch_factor': 2, 'resolution': [1280, 704], 'vit_out_dim': 1024, 'vit_resolution': [224, 224], 'depth_clamp': 10.0, 'misc_size': 384, 'depth_std': 20.0, 'frame_lens': [16, 16, 16, 16, 16, 32, 32, 32], 'sample_fps': [8, 8, 16, 16, 16, 8, 16, 16], 'vid_dataset': {'type': 'VideoDataset', 'data_list': ['data/vid_list.txt'], 'max_words': 1000, 'resolution': [1280, 704], 'data_dir_list': ['data/videos/'], 'vit_resolution': [224, 224], 'get_first_frame': True}, 'img_dataset': {'type': 'ImageDataset', 'data_list': ['data/img_list.txt'], 'max_words': 1000, 'resolution': [1280, 704], 'data_dir_list': ['data/images'], 'vit_resolution': [224, 224]}, 'batch_sizes': {'1': 32, '4': 8, '8': 4, '16': 2, '32': 1}, 'Diffusion': {'type': 'DiffusionDDIM', 'schedule': 'cosine', 'schedule_param': {'num_timesteps': 1000, 'cosine_s': 0.008, 'zero_terminal_snr': True}, 'mean_type': 'v', 'loss_type': 'mse', 'var_type': 'fixed_small', 'rescale_timesteps': False, 'noise_strength': 0.1, 'ddim_timesteps': 50}, 'ddim_timesteps': 50, 'use_div_loss': False, 'p_zero': 0.0, 'guide_scale': 9.0, 'vit_mean': [0.48145466, 0.4578275, 0.40821073], 'vit_std': [0.26862954, 0.26130258, 0.27577711], 'sketch_mean': [0.485, 0.456, 0.406], 'sketch_std': [0.229, 0.224, 0.225], 'hist_sigma': 10.0, 'scale_factor': 0.18215, 'use_checkpoint': True, 'use_sharded_ddp': False, 'use_fsdp': False, 'use_fp16': True, 'temporal_attention': True, 'UNet': {'type': 'UNetSD_I2VGen', 'in_dim': 4, 'dim': 320, 'y_dim': 1024, 'context_dim': 1024, 'out_dim': 4, 'dim_mult': [1, 2, 4, 4], 'num_heads': 8, 'head_dim': 64, 'num_res_blocks': 2, 'attn_scales': [1.0, 0.5, 0.25], 'dropout': 0.1, 'temporal_attention': True, 'temporal_attn_times': 1, 'use_checkpoint': True, 'use_fps_condition': False, 'use_sim_mask': False, 'upper_len': 128, 'concat_dim': 4, 'default_fps': 8}, 'guidances': [], 'auto_encoder': {'type': 'AutoencoderKL', 'ddconfig': {'double_z': True, 'z_channels': 4, 'resolution': 256, 'in_channels': 3, 'out_ch': 3, 'ch': 128, 'ch_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_resolutions': [], 'dropout': 0.0, 'video_kernel_size': [3, 1, 1]}, 'embed_dim': 4, 'pretrained': 'models/v2-1_512-ema-pruned.ckpt'}, 'embedder': {'type': 'FrozenOpenCLIPTextVisualEmbedder', 'layer': 'penultimate', 'pretrained': '/content/drive/MyDrive/Colab Notebooks/VGen/models/open_clip_pytorch_model.bin', 'vit_resolution': [224, 224]}, 'ema_decay': 0.9999, 'num_steps': 1000000, 'lr': 3e-05, 'weight_decay': 0.0, 'betas': [0.9, 0.999], 'eps': 1e-08, 'chunk_size': 2, 'decoder_bs': 2, 'alpha': 0.7, 'save_ckp_interval': 50, 'warmup_steps': 10, 'decay_mode': 'cosine', 'use_ema': True, 'load_from': None, 'Pretrain': {'type': 'pretrain_specific_strategies', 'fix_weight': False, 'grad_scale': 0.5, 'resume_checkpoint': '/content/drive/MyDrive/Colab Notebooks/VGen/models/i2vgen_xl_00854500.pth', 'sd_keys_path': '/content/drive/MyDrive/Colab Notebooks/VGen/models/stable_diffusion_image_key_temporal_attention_x1.json'}, 'viz_interval': 50, 'visual_train': {'type': 'VisualTrainTextImageToVideo', 'partial_keys': [['y', 'image', 'local_image', 'fps']], 'use_offset_noise': True, 'guide_scale': 9.0}, 'visual_inference': {'type': 'VisualGeneratedVideos'}, 'inference_list_path': '', 'log_interval': 1, 'log_dir': 'workspace/experiments/test_list_for_i2vgen', 'reward_type': 'HPSv2', 'temporal_reward_type': [], 'data_align_method': None, 'data_align_coef': 10, 'segments': 8, 'selection_method': 'fixed_first', 'exponential_TSN': True, 'lambda_TAR': 1.0, 'reward_normalization': False, 'positive_reward': False, 'partial_timestep': None, 'ddim_steps': [981, 961, 941, 921, 901, 881, 861, 841, 821, 801, 781, 761, 741, 721, 701, 681, 661, 641, 621, 601, 581, 561, 541, 521, 501, 481, 461, 441, 421, 401, 381, 361, 341, 321, 301, 281, 261, 241, 221, 201, 181, 161, 141, 121, 101, 81, 61, 41, 21, 1], 'motion_rep': None, 'low_penal_threshold': 0.05, 'reward_weights': {'reward': 1, 'reg': 1}, 'temp_dir': 'workspace/temp_dir', 'adv_clip_max': 5, 'ST_reward_weights': {'spatial': 1, 'temporal': 1}, 'seed': 8888, 'negative_prompt': 'Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms', 'ENABLE': True, 'DATASET': 'webvid10m', 'TASK_TYPE': 'inference_i2vgen_entrance', 'max_frames': 16, 'target_fps': 16, 'scale': 8, 'round': 4, 'batch_size': 1, 'use_zero_infer': True, 'vldm_cfg': 'configs/i2vgen_xl_train.yaml', 'test_list_path': 'data/test_list_for_i2vgen.txt', 'test_model': '/content/drive/MyDrive/Colab Notebooks/VGen/models/i2vgen_xl_00854500.pth', 'cfg_file': 'configs/i2vgen_xl_infer.yaml', 'init_method': 'tcp://localhost:9999', 'debug': False, 'opts': [], 'pmi_rank': 0, 'pmi_world_size': 1, 'gpus_per_machine': 1, 'world_size': 1, 'noise_strength': 0.1, 'gpu': 0, 'rank': 0, 'log_file': 'workspace/experiments/test_list_for_i2vgen/log_00.txt'} [2024-08-05 04:34:39,677] INFO: Going into it2v_fullid_img_text inference on 0 gpu [2024-08-05 04:34:39,691] INFO: Loading ViT-H-14 model config. [2024-08-05 04:34:50,206] INFO: Loading pretrained ViT-H-14 weights (/content/drive/MyDrive/Colab Notebooks/VGen/models/open_clip_pytorch_model.bin). Traceback (most recent call last): File "/content/drive/MyDrive/Colab Notebooks/VGen/utils/registry.py", line 62, in build_from_config return req_type_entry(cfg) File "/content/drive/MyDrive/Colab Notebooks/VGen/tools/modules/autoencoder.py", line 62, in init self.init_from_ckpt(pretrained, ignore_keys=ignore_keys) File "/content/drive/MyDrive/Colab Notebooks/VGen/tools/modules/autoencoder.py", line 65, in init_from_ckpt sd = torch.load(path, map_location="cpu")["state_dict"] File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 789, in load return _load(opened_zipfile, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1131, in _load result = unpickler.load() File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1124, in find_class return super().find_class(mod_name, name) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/init.py", line 20, in from pytorch_lightning import metrics # noqa: E402 File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/init.py", line 15, in from pytorch_lightning.metrics.classification import ( # noqa: F401 File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/classification/init.py", line 14, in from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401 File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in from pytorch_lightning.metrics.utils import deprecated_metrics, void File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/utils.py", line 29, in from pytorch_lightning.utilities import rank_zero_deprecation File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/init.py", line 18, in from pytorch_lightning.utilities.apply_func import move_data_to_device # noqa: F401 File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/apply_func.py", line 31, in from torchtext.legacy.data import Batch ModuleNotFoundError: No module named 'torchtext.legacy'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/content/drive/MyDrive/Colab Notebooks/VGen/utils/registry.py", line 67, in build_from_config return req_type_entry(*cfg) File "/content/drive/MyDrive/Colab Notebooks/VGen/tools/inferences/inference_i2vgen_entrance.py", line 74, in inference_i2vgen_entrance worker(0, cfg, cfg_update) File "/content/drive/MyDrive/Colab Notebooks/VGen/tools/inferences/inference_i2vgen_entrance.py", line 145, in worker autoencoder = AUTO_ENCODER.build(cfg.auto_encoder) File "/content/drive/MyDrive/Colab Notebooks/VGen/utils/registry.py", line 107, in build return self.build_func(args, kwargs, registry=self) File "/content/drive/MyDrive/Colab Notebooks/VGen/utils/registry_class.py", line 7, in build_func return build_from_config(cfg, registry, kwargs) File "/content/drive/MyDrive/Colab Notebooks/VGen/utils/registry.py", line 64, in build_from_config raise Exception(f"Failed to init class {req_type_entry}, with {e}") Exception: Failed to init class <class 'tools.modules.autoencoder.AutoencoderKL'>, with No module named 'torchtext.legacy'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/content/drive/MyDrive/Colab Notebooks/VGen/inference.py", line 18, in INFER_ENGINE.build(dict(type=cfg_update.TASK_TYPE), cfg_update=cfg_update.cfg_dict) File "/content/drive/MyDrive/Colab Notebooks/VGen/utils/registry.py", line 107, in build return self.build_func(*args, kwargs, registry=self) File "/content/drive/MyDrive/Colab Notebooks/VGen/utils/registry_class.py", line 7, in build_func return build_from_config(cfg, registry, kwargs) File "/content/drive/MyDrive/Colab Notebooks/VGen/utils/registry.py", line 69, in build_from_config raise Exception(f"Failed to invoke function {req_type_entry}, with {e}") Exception: Failed to invoke function <function inference_i2vgen_entrance at 0x7c4690bbb6d0>, with Failed to init class <class 'tools.modules.autoencoder.AutoencoderKL'>, with No module named 'torchtext.legacy'

I tried giving an earlier combo of torch audio vision data and text but then it tells me that the cuda version is not compatible. Any help getting this running on collab would be great.

sburlce commented 1 month ago

It appears that torchlighting 1.4.2 uses the torchtext.legacy.data method, I am currently trying it with torchlightning 1.6.5 and will update if that works to resolve the issue