ExponentialML / Text-To-Video-Finetuning

Finetune ModelScope's Text To Video model using Diffusers ๐Ÿงจ
MIT License
657 stars 104 forks source link

NameError: name 'glob' is not defined #45

Closed ImBadAtNames2019 closed 1 year ago

ImBadAtNames2019 commented 1 year ago

After i run the script train_config.yaml i get this error below:

2023-04-09 13:40:38.702636: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.9/dist-packages/accelerate/accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of ๐Ÿค— Accelerate. Use project_dir instead. warnings.warn( 04/09/2023 13:40:40 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: fp16

{'variance_type'} was not found in config. Values will be initialized to default values. /usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(checkpoint_file, framework="pt") as f: /usr/local/lib/python3.9/dist-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() /usr/local/lib/python3.9/dist-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = cls(wrap_storage=untyped_storage) /usr/local/lib/python3.9/dist-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(filename, framework="pt", device=device) as f: {'mid_block_scale_factor', 'downsample_padding'} was not found in config. Values will be initialized to default values. 33 Attention layers using Scaled Dot Product Attention. Lora successfully injected into UNet3DConditionModel. Lora successfully injected into CLIPTextModel. Non-existant JSON path. Skipping. โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ /content/Text-To-Video-Finetuning/train.py:915 in โ”‚ โ”‚ โ”‚ โ”‚ 912 โ”‚ parser.add_argument("--config", type=str, default="./configs/my_co โ”‚ โ”‚ 913 โ”‚ args = parser.parse_args() โ”‚ โ”‚ 914 โ”‚ โ”‚ โ”‚ โฑ 915 โ”‚ main(OmegaConf.load(args.config)) โ”‚ โ”‚ 916 โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/train.py:582 in main โ”‚ โ”‚ โ”‚ โ”‚ 579 โ”‚ ) โ”‚ โ”‚ 580 โ”‚ โ”‚ โ”‚ 581 โ”‚ # Get the training dataset based on types (json, single_video, ima โ”‚ โ”‚ โฑ 582 โ”‚ train_datasets = get_train_dataset(dataset_types, train_data, toke โ”‚ โ”‚ 583 โ”‚ โ”‚ โ”‚ 584 โ”‚ # Extend datasets that are less than the greatest one. This allows โ”‚ โ”‚ 585 โ”‚ attrs = ['train_data', 'frames', 'image_dir', 'video_files'] โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/train.py:86 in get_train_dataset โ”‚ โ”‚ โ”‚ โ”‚ 83 โ”‚ for DataSet in [VideoJsonDataset, SingleVideoDataset, ImageDataset โ”‚ โ”‚ 84 โ”‚ โ”‚ for dataset in dataset_types: โ”‚ โ”‚ 85 โ”‚ โ”‚ โ”‚ if dataset == DataSet.getname(): โ”‚ โ”‚ โฑ 86 โ”‚ โ”‚ โ”‚ โ”‚ train_datasets.append(DataSet(train_data, tokenizer= โ”‚ โ”‚ 87 โ”‚ โ”‚ โ”‚ 88 โ”‚ if len(train_datasets) > 0: โ”‚ โ”‚ 89 โ”‚ โ”‚ return train_datasets โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:487 in init โ”‚ โ”‚ โ”‚ โ”‚ 484 โ”‚ โ”‚ โ”‚ โ”‚ 485 โ”‚ โ”‚ self.fallback_prompt = fallback_prompt โ”‚ โ”‚ 486 โ”‚ โ”‚ โ”‚ โ”‚ โฑ 487 โ”‚ โ”‚ self.video_files = glob(f"{path}/*.mp4") โ”‚ โ”‚ 488 โ”‚ โ”‚ โ”‚ โ”‚ 489 โ”‚ โ”‚ self.width = width โ”‚ โ”‚ 490 โ”‚ โ”‚ self.height = height โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ NameError: name 'glob' is not defined

ImBadAtNames2019 commented 1 year ago

Im using the exact same dataset that i also used in the previous version of this repo, and it worked before no problem.

ImBadAtNames2019 commented 1 year ago

I added import glob on top of the dataset.py file and now i get this error

33 Attention layers using Scaled Dot Product Attention. Lora successfully injected into UNet3DConditionModel. Lora successfully injected into CLIPTextModel. Traceback (most recent call last): File "/content/Text-To-Video-Finetuning/train.py", line 915, in main(OmegaConf.load(args.config)) File "/content/Text-To-Video-Finetuning/train.py", line 582, in main train_datasets = get_train_dataset(dataset_types, train_data, tokenizer) File "/content/Text-To-Video-Finetuning/train.py", line 86, in get_train_dataset train_datasets.append(DataSet(train_data, tokenizer=tokenizer)) File "/content/Text-To-Video-Finetuning/utils/dataset.py", line 488, in init self.video_files = glob(f"{path}/*.mp4") TypeError: 'module' object is not callable

ImBadAtNames2019 commented 1 year ago

Im running it on google colab, if that matters.

ImBadAtNames2019 commented 1 year ago

I tried adding from glob import glob instead of import glob on top of the dataset.py script and train.py script, and now i get this error instead

Lora successfully injected into UNet3DConditionModel. Lora successfully injected into CLIPTextModel. Caching Latents.: 0% 0/29 [00:00<?, ?it/s] โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ /content/Text-To-Video-Finetuning/train.py:915 in โ”‚ โ”‚ โ”‚ โ”‚ 912 โ”‚ parser.add_argument("--config", type=str, default="./configs/my_co โ”‚ โ”‚ 913 โ”‚ args = parser.parse_args() โ”‚ โ”‚ 914 โ”‚ โ”‚ โ”‚ โฑ 915 โ”‚ main(**OmegaConf.load(args.config)) โ”‚ โ”‚ 916 โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/train.py:604 in main โ”‚ โ”‚ โ”‚ โ”‚ 601 โ”‚ ) โ”‚ โ”‚ 602 โ”‚ โ”‚ โ”‚ 603 โ”‚ # Latents caching โ”‚ โ”‚ โฑ 604 โ”‚ cached_data_loader = handle_cache_latents( โ”‚ โ”‚ 605 โ”‚ โ”‚ cache_latents, โ”‚ โ”‚ 606 โ”‚ โ”‚ output_dir, โ”‚ โ”‚ 607 โ”‚ โ”‚ train_dataloader, โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/train.py:333 in handle_cache_latents โ”‚ โ”‚ โ”‚ โ”‚ 330 โ”‚ โ”‚ cache_save_dir = f"{output_dir}/cached_latents" โ”‚ โ”‚ 331 โ”‚ โ”‚ os.makedirs(cache_save_dir, exist_ok=True) โ”‚ โ”‚ 332 โ”‚ โ”‚ โ”‚ โ”‚ โฑ 333 โ”‚ โ”‚ for i, batch in enumerate(tqdm(train_dataloader, desc="Caching โ”‚ โ”‚ 334 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ 335 โ”‚ โ”‚ โ”‚ savename = f"cached{i}" โ”‚ โ”‚ 336 โ”‚ โ”‚ โ”‚ full_out_path = f"{cache_save_dir}/{save_name}.pt" โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/tqdm/std.py:1178 in iter โ”‚ โ”‚ โ”‚ โ”‚ 1175 โ”‚ โ”‚ time = self._time โ”‚ โ”‚ 1176 โ”‚ โ”‚ โ”‚ โ”‚ 1177 โ”‚ โ”‚ try: โ”‚ โ”‚ โฑ 1178 โ”‚ โ”‚ โ”‚ for obj in iterable: โ”‚ โ”‚ 1179 โ”‚ โ”‚ โ”‚ โ”‚ yield obj โ”‚ โ”‚ 1180 โ”‚ โ”‚ โ”‚ โ”‚ # Update and possibly print the progressbar. โ”‚ โ”‚ 1181 โ”‚ โ”‚ โ”‚ โ”‚ # Note: does not call self.update(1) for speed optimi โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:634 in โ”‚ โ”‚ next โ”‚ โ”‚ โ”‚ โ”‚ 631 โ”‚ โ”‚ โ”‚ if self._sampler_iter is None: โ”‚ โ”‚ 632 โ”‚ โ”‚ โ”‚ โ”‚ # TODO(https://github.com/pytorch/pytorch/issues/7675 โ”‚ โ”‚ 633 โ”‚ โ”‚ โ”‚ โ”‚ self._reset() # type: ignore[call-arg] โ”‚ โ”‚ โฑ 634 โ”‚ โ”‚ โ”‚ data = self._next_data() โ”‚ โ”‚ 635 โ”‚ โ”‚ โ”‚ self._num_yielded += 1 โ”‚ โ”‚ 636 โ”‚ โ”‚ โ”‚ if self._dataset_kind == _DatasetKind.Iterable and \ โ”‚ โ”‚ 637 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ self._IterableDataset_len_called is not None and โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:678 in โ”‚ โ”‚ _next_data โ”‚ โ”‚ โ”‚ โ”‚ 675 โ”‚ โ”‚ โ”‚ 676 โ”‚ def _next_data(self): โ”‚ โ”‚ 677 โ”‚ โ”‚ index = self._next_index() # may raise StopIteration โ”‚ โ”‚ โฑ 678 โ”‚ โ”‚ data = self._dataset_fetcher.fetch(index) # may raise StopIt โ”‚ โ”‚ 679 โ”‚ โ”‚ if self._pin_memory: โ”‚ โ”‚ 680 โ”‚ โ”‚ โ”‚ data = _utils.pin_memory.pin_memory(data, self._pin_memor โ”‚ โ”‚ 681 โ”‚ โ”‚ return data โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/utils/fetch.py:51 โ”‚ โ”‚ in fetch โ”‚ โ”‚ โ”‚ โ”‚ 48 โ”‚ โ”‚ โ”‚ if hasattr(self.dataset, "getitems") and self.dataset. โ”‚ โ”‚ 49 โ”‚ โ”‚ โ”‚ โ”‚ data = self.dataset.getitems(possibly_batched_index โ”‚ โ”‚ 50 โ”‚ โ”‚ โ”‚ else: โ”‚ โ”‚ โฑ 51 โ”‚ โ”‚ โ”‚ โ”‚ data = [self.dataset[idx] for idx in possibly_batched_i โ”‚ โ”‚ 52 โ”‚ โ”‚ else: โ”‚ โ”‚ 53 โ”‚ โ”‚ โ”‚ data = self.dataset[possibly_batched_index] โ”‚ โ”‚ 54 โ”‚ โ”‚ return self.collate_fn(data) โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/utils/fetch.py:51 โ”‚ โ”‚ in โ”‚ โ”‚ โ”‚ โ”‚ 48 โ”‚ โ”‚ โ”‚ if hasattr(self.dataset, "getitems") and self.dataset. โ”‚ โ”‚ 49 โ”‚ โ”‚ โ”‚ โ”‚ data = self.dataset.getitems(possibly_batched_index โ”‚ โ”‚ 50 โ”‚ โ”‚ โ”‚ else: โ”‚ โ”‚ โฑ 51 โ”‚ โ”‚ โ”‚ โ”‚ data = [self.dataset[idx] for idx in possibly_batched_i โ”‚ โ”‚ 52 โ”‚ โ”‚ else: โ”‚ โ”‚ 53 โ”‚ โ”‚ โ”‚ data = self.dataset[possibly_batched_index] โ”‚ โ”‚ 54 โ”‚ โ”‚ return self.collatefn(data) โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:549 in getitem โ”‚ โ”‚ โ”‚ โ”‚ 546 โ”‚ โ”‚ โ”‚ 547 โ”‚ def getitem(self, index): โ”‚ โ”‚ 548 โ”‚ โ”‚ โ”‚ โ”‚ โฑ 549 โ”‚ โ”‚ video, = self.process_video_wrapper(self.video_files[index]) โ”‚ โ”‚ 550 โ”‚ โ”‚ โ”‚ โ”‚ 551 โ”‚ โ”‚ if os.path.exists(self.video_files[index].replace(".mp4", ".tx โ”‚ โ”‚ 552 โ”‚ โ”‚ โ”‚ with open(self.video_files[index].replace(".mp4", ".txt"), โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:522 in โ”‚ โ”‚ process_video_wrapper โ”‚ โ”‚ โ”‚ โ”‚ 519 โ”‚ โ”‚ return video, vr โ”‚ โ”‚ 520 โ”‚ โ”‚ โ”‚ 521 โ”‚ def process_video_wrapper(self, vid_path): โ”‚ โ”‚ โฑ 522 โ”‚ โ”‚ video, vr = process_video( โ”‚ โ”‚ 523 โ”‚ โ”‚ โ”‚ โ”‚ vid_path, โ”‚ โ”‚ 524 โ”‚ โ”‚ โ”‚ โ”‚ self.use_bucketing, โ”‚ โ”‚ 525 โ”‚ โ”‚ โ”‚ โ”‚ self.width, โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:78 in process_video โ”‚ โ”‚ โ”‚ โ”‚ 75 def process_video(vid_path, use_bucketing, w, h, get_frame_buckets, ge โ”‚ โ”‚ 76 โ”‚ if use_bucketing: โ”‚ โ”‚ 77 โ”‚ โ”‚ vr = decord.VideoReader(vid_path) โ”‚ โ”‚ โฑ 78 โ”‚ โ”‚ resize = get_frame_buckets(vr) โ”‚ โ”‚ 79 โ”‚ โ”‚ video = get_frame_batch(vr, resize=resize) โ”‚ โ”‚ 80 โ”‚ โ”‚ โ”‚ 81 โ”‚ else: โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:497 in get_frame_buckets โ”‚ โ”‚ โ”‚ โ”‚ 494 โ”‚ โ”‚ โ”‚ 495 โ”‚ def get_framebuckets(self, vr): โ”‚ โ”‚ 496 โ”‚ โ”‚ , h, w = vr[0].shape โ”‚ โ”‚ โฑ 497 โ”‚ โ”‚ width, height = sensible_buckets(self.width, self.height, h, w โ”‚ โ”‚ 498 โ”‚ โ”‚ resize = T.transforms.Resize((height, width), antialias=True) โ”‚ โ”‚ 499 โ”‚ โ”‚ โ”‚ โ”‚ 500 โ”‚ โ”‚ return resize โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ NameError: name 'sensible_buckets' is not defined

ImBadAtNames2019 commented 1 year ago

Maybe its because im not using conda? I cant make conda work on google colab.

ImBadAtNames2019 commented 1 year ago

Nope, i have no idea what to do, i cant use this at all now.

ImBadAtNames2019 commented 1 year ago

Now i get this error:

2023-04-09 17:03:29.790548: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.9/dist-packages/accelerate/accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of ๐Ÿค— Accelerate. Use project_dir instead. warnings.warn( 04/09/2023 17:03:31 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: fp16

{'variance_type'} was not found in config. Values will be initialized to default values. /usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(checkpoint_file, framework="pt") as f: /usr/local/lib/python3.9/dist-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() /usr/local/lib/python3.9/dist-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = cls(wrap_storage=untyped_storage) /usr/local/lib/python3.9/dist-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(filename, framework="pt", device=device) as f: {'mid_block_scale_factor', 'downsample_padding'} was not found in config. Values will be initialized to default values. 33 Attention layers using Scaled Dot Product Attention. Lora successfully injected into UNet3DConditionModel. Lora successfully injected into CLIPTextModel. Caching Latents.: 0% 0/29 [00:01<?, ?it/s] โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ /content/Text-To-Video-Finetuning/train.py:914 in โ”‚ โ”‚ โ”‚ โ”‚ 911 โ”‚ parser.add_argument("--config", type=str, default="./configs/my_co โ”‚ โ”‚ 912 โ”‚ args = parser.parse_args() โ”‚ โ”‚ 913 โ”‚ โ”‚ โ”‚ โฑ 914 โ”‚ main(**OmegaConf.load(args.config)) โ”‚ โ”‚ 915 โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/train.py:603 in main โ”‚ โ”‚ โ”‚ โ”‚ 600 โ”‚ ) โ”‚ โ”‚ 601 โ”‚ โ”‚ โ”‚ 602 โ”‚ # Latents caching โ”‚ โ”‚ โฑ 603 โ”‚ cached_data_loader = handle_cache_latents( โ”‚ โ”‚ 604 โ”‚ โ”‚ cache_latents, โ”‚ โ”‚ 605 โ”‚ โ”‚ output_dir, โ”‚ โ”‚ 606 โ”‚ โ”‚ train_dataloader, โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/train.py:332 in handle_cache_latents โ”‚ โ”‚ โ”‚ โ”‚ 329 โ”‚ โ”‚ cache_save_dir = f"{output_dir}/cached_latents" โ”‚ โ”‚ 330 โ”‚ โ”‚ os.makedirs(cache_save_dir, exist_ok=True) โ”‚ โ”‚ 331 โ”‚ โ”‚ โ”‚ โ”‚ โฑ 332 โ”‚ โ”‚ for i, batch in enumerate(tqdm(train_dataloader, desc="Caching โ”‚ โ”‚ 333 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ 334 โ”‚ โ”‚ โ”‚ savename = f"cached{i}" โ”‚ โ”‚ 335 โ”‚ โ”‚ โ”‚ full_out_path = f"{cache_save_dir}/{save_name}.pt" โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/tqdm/std.py:1178 in iter โ”‚ โ”‚ โ”‚ โ”‚ 1175 โ”‚ โ”‚ time = self._time โ”‚ โ”‚ 1176 โ”‚ โ”‚ โ”‚ โ”‚ 1177 โ”‚ โ”‚ try: โ”‚ โ”‚ โฑ 1178 โ”‚ โ”‚ โ”‚ for obj in iterable: โ”‚ โ”‚ 1179 โ”‚ โ”‚ โ”‚ โ”‚ yield obj โ”‚ โ”‚ 1180 โ”‚ โ”‚ โ”‚ โ”‚ # Update and possibly print the progressbar. โ”‚ โ”‚ 1181 โ”‚ โ”‚ โ”‚ โ”‚ # Note: does not call self.update(1) for speed optimi โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:634 in โ”‚ โ”‚ next โ”‚ โ”‚ โ”‚ โ”‚ 631 โ”‚ โ”‚ โ”‚ if self._sampler_iter is None: โ”‚ โ”‚ 632 โ”‚ โ”‚ โ”‚ โ”‚ # TODO(https://github.com/pytorch/pytorch/issues/7675 โ”‚ โ”‚ 633 โ”‚ โ”‚ โ”‚ โ”‚ self._reset() # type: ignore[call-arg] โ”‚ โ”‚ โฑ 634 โ”‚ โ”‚ โ”‚ data = self._next_data() โ”‚ โ”‚ 635 โ”‚ โ”‚ โ”‚ self._num_yielded += 1 โ”‚ โ”‚ 636 โ”‚ โ”‚ โ”‚ if self._dataset_kind == _DatasetKind.Iterable and \ โ”‚ โ”‚ 637 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ self._IterableDataset_len_called is not None and โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:678 in โ”‚ โ”‚ _next_data โ”‚ โ”‚ โ”‚ โ”‚ 675 โ”‚ โ”‚ โ”‚ 676 โ”‚ def _next_data(self): โ”‚ โ”‚ 677 โ”‚ โ”‚ index = self._next_index() # may raise StopIteration โ”‚ โ”‚ โฑ 678 โ”‚ โ”‚ data = self._dataset_fetcher.fetch(index) # may raise StopIt โ”‚ โ”‚ 679 โ”‚ โ”‚ if self._pin_memory: โ”‚ โ”‚ 680 โ”‚ โ”‚ โ”‚ data = _utils.pin_memory.pin_memory(data, self._pin_memor โ”‚ โ”‚ 681 โ”‚ โ”‚ return data โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/utils/fetch.py:51 โ”‚ โ”‚ in fetch โ”‚ โ”‚ โ”‚ โ”‚ 48 โ”‚ โ”‚ โ”‚ if hasattr(self.dataset, "getitems") and self.dataset. โ”‚ โ”‚ 49 โ”‚ โ”‚ โ”‚ โ”‚ data = self.dataset.getitems(possibly_batched_index โ”‚ โ”‚ 50 โ”‚ โ”‚ โ”‚ else: โ”‚ โ”‚ โฑ 51 โ”‚ โ”‚ โ”‚ โ”‚ data = [self.dataset[idx] for idx in possibly_batched_i โ”‚ โ”‚ 52 โ”‚ โ”‚ else: โ”‚ โ”‚ 53 โ”‚ โ”‚ โ”‚ data = self.dataset[possibly_batched_index] โ”‚ โ”‚ 54 โ”‚ โ”‚ return self.collate_fn(data) โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/utils/fetch.py:51 โ”‚ โ”‚ in โ”‚ โ”‚ โ”‚ โ”‚ 48 โ”‚ โ”‚ โ”‚ if hasattr(self.dataset, "getitems") and self.dataset. โ”‚ โ”‚ 49 โ”‚ โ”‚ โ”‚ โ”‚ data = self.dataset.getitems(possibly_batched_index โ”‚ โ”‚ 50 โ”‚ โ”‚ โ”‚ else: โ”‚ โ”‚ โฑ 51 โ”‚ โ”‚ โ”‚ โ”‚ data = [self.dataset[idx] for idx in possibly_batched_i โ”‚ โ”‚ 52 โ”‚ โ”‚ else: โ”‚ โ”‚ 53 โ”‚ โ”‚ โ”‚ data = self.dataset[possibly_batched_index] โ”‚ โ”‚ 54 โ”‚ โ”‚ return self.collate_fn(data) โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:560 in getitem โ”‚ โ”‚ โ”‚ โ”‚ 557 โ”‚ โ”‚ โ”‚ โ”‚ 558 โ”‚ โ”‚ prompt_ids = self.get_prompt_ids(prompt) โ”‚ โ”‚ 559 โ”‚ โ”‚ โ”‚ โ”‚ โฑ 560 โ”‚ โ”‚ return {"pixel_values": (video / 127.5 - 1.0), "prompt_ids": p โ”‚ โ”‚ 561 โ”‚ โ”‚ 562 class CachedDataset(Dataset): โ”‚ โ”‚ 563 โ”‚ def init(self,cache_dir: str = ''): โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ TypeError: unsupported operand type(s) for /: 'tuple' and 'float'

ExponentialML commented 1 year ago

I just pushed a quick fix. Can you check to see if it works?

ImBadAtNames2019 commented 1 year ago

I just pushed a quick fix. Can you check to see if it works?

Im testing now, give me one second.

ImBadAtNames2019 commented 1 year ago

I just pushed a quick fix. Can you check to see if it works?

Nope, now i get this error. Im using the default config file, i only changed the location of the model and the location of the folder containing the videos.

2023-04-09 19:42:03.849781: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.9/dist-packages/accelerate/accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of ๐Ÿค— Accelerate. Use project_dir instead. warnings.warn( 04/09/2023 19:42:05 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: fp16

{'variance_type'} was not found in config. Values will be initialized to default values. /usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(checkpoint_file, framework="pt") as f: /usr/local/lib/python3.9/dist-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() /usr/local/lib/python3.9/dist-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = cls(wrap_storage=untyped_storage) /usr/local/lib/python3.9/dist-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(filename, framework="pt", device=device) as f: {'mid_block_scale_factor', 'downsample_padding'} was not found in config. Values will be initialized to default values. 33 Attention layers using Scaled Dot Product Attention. Lora successfully injected into UNet3DConditionModel. Lora successfully injected into CLIPTextModel. Caching Latents.: 0% 0/29 [00:00<?, ?it/s] โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ /content/Text-To-Video-Finetuning/train.py:914 in โ”‚ โ”‚ โ”‚ โ”‚ 911 โ”‚ parser.add_argument("--config", type=str, default="./configs/my_co โ”‚ โ”‚ 912 โ”‚ args = parser.parse_args() โ”‚ โ”‚ 913 โ”‚ โ”‚ โ”‚ โฑ 914 โ”‚ main(**OmegaConf.load(args.config)) โ”‚ โ”‚ 915 โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/train.py:603 in main โ”‚ โ”‚ โ”‚ โ”‚ 600 โ”‚ ) โ”‚ โ”‚ 601 โ”‚ โ”‚ โ”‚ 602 โ”‚ # Latents caching โ”‚ โ”‚ โฑ 603 โ”‚ cached_data_loader = handle_cache_latents( โ”‚ โ”‚ 604 โ”‚ โ”‚ cache_latents, โ”‚ โ”‚ 605 โ”‚ โ”‚ output_dir, โ”‚ โ”‚ 606 โ”‚ โ”‚ train_dataloader, โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/train.py:332 in handle_cache_latents โ”‚ โ”‚ โ”‚ โ”‚ 329 โ”‚ โ”‚ cache_save_dir = f"{output_dir}/cached_latents" โ”‚ โ”‚ 330 โ”‚ โ”‚ os.makedirs(cache_save_dir, exist_ok=True) โ”‚ โ”‚ 331 โ”‚ โ”‚ โ”‚ โ”‚ โฑ 332 โ”‚ โ”‚ for i, batch in enumerate(tqdm(train_dataloader, desc="Caching โ”‚ โ”‚ 333 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ 334 โ”‚ โ”‚ โ”‚ savename = f"cached{i}" โ”‚ โ”‚ 335 โ”‚ โ”‚ โ”‚ full_out_path = f"{cache_save_dir}/{save_name}.pt" โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/tqdm/std.py:1178 in iter โ”‚ โ”‚ โ”‚ โ”‚ 1175 โ”‚ โ”‚ time = self._time โ”‚ โ”‚ 1176 โ”‚ โ”‚ โ”‚ โ”‚ 1177 โ”‚ โ”‚ try: โ”‚ โ”‚ โฑ 1178 โ”‚ โ”‚ โ”‚ for obj in iterable: โ”‚ โ”‚ 1179 โ”‚ โ”‚ โ”‚ โ”‚ yield obj โ”‚ โ”‚ 1180 โ”‚ โ”‚ โ”‚ โ”‚ # Update and possibly print the progressbar. โ”‚ โ”‚ 1181 โ”‚ โ”‚ โ”‚ โ”‚ # Note: does not call self.update(1) for speed optimi โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:634 in โ”‚ โ”‚ next โ”‚ โ”‚ โ”‚ โ”‚ 631 โ”‚ โ”‚ โ”‚ if self._sampler_iter is None: โ”‚ โ”‚ 632 โ”‚ โ”‚ โ”‚ โ”‚ # TODO(https://github.com/pytorch/pytorch/issues/7675 โ”‚ โ”‚ 633 โ”‚ โ”‚ โ”‚ โ”‚ self._reset() # type: ignore[call-arg] โ”‚ โ”‚ โฑ 634 โ”‚ โ”‚ โ”‚ data = self._next_data() โ”‚ โ”‚ 635 โ”‚ โ”‚ โ”‚ self._num_yielded += 1 โ”‚ โ”‚ 636 โ”‚ โ”‚ โ”‚ if self._dataset_kind == _DatasetKind.Iterable and \ โ”‚ โ”‚ 637 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ self._IterableDataset_len_called is not None and โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:678 in โ”‚ โ”‚ _next_data โ”‚ โ”‚ โ”‚ โ”‚ 675 โ”‚ โ”‚ โ”‚ 676 โ”‚ def _next_data(self): โ”‚ โ”‚ 677 โ”‚ โ”‚ index = self._next_index() # may raise StopIteration โ”‚ โ”‚ โฑ 678 โ”‚ โ”‚ data = self._dataset_fetcher.fetch(index) # may raise StopIt โ”‚ โ”‚ 679 โ”‚ โ”‚ if self._pin_memory: โ”‚ โ”‚ 680 โ”‚ โ”‚ โ”‚ data = _utils.pin_memory.pin_memory(data, self._pin_memor โ”‚ โ”‚ 681 โ”‚ โ”‚ return data โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/utils/fetch.py:51 โ”‚ โ”‚ in fetch โ”‚ โ”‚ โ”‚ โ”‚ 48 โ”‚ โ”‚ โ”‚ if hasattr(self.dataset, "getitems") and self.dataset. โ”‚ โ”‚ 49 โ”‚ โ”‚ โ”‚ โ”‚ data = self.dataset.getitems(possibly_batched_index โ”‚ โ”‚ 50 โ”‚ โ”‚ โ”‚ else: โ”‚ โ”‚ โฑ 51 โ”‚ โ”‚ โ”‚ โ”‚ data = [self.dataset[idx] for idx in possibly_batched_i โ”‚ โ”‚ 52 โ”‚ โ”‚ else: โ”‚ โ”‚ 53 โ”‚ โ”‚ โ”‚ data = self.dataset[possibly_batched_index] โ”‚ โ”‚ 54 โ”‚ โ”‚ return self.collate_fn(data) โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/utils/fetch.py:51 โ”‚ โ”‚ in โ”‚ โ”‚ โ”‚ โ”‚ 48 โ”‚ โ”‚ โ”‚ if hasattr(self.dataset, "getitems") and self.dataset. โ”‚ โ”‚ 49 โ”‚ โ”‚ โ”‚ โ”‚ data = self.dataset.getitems(possibly_batched_index โ”‚ โ”‚ 50 โ”‚ โ”‚ โ”‚ else: โ”‚ โ”‚ โฑ 51 โ”‚ โ”‚ โ”‚ โ”‚ data = [self.dataset[idx] for idx in possibly_batched_i โ”‚ โ”‚ 52 โ”‚ โ”‚ else: โ”‚ โ”‚ 53 โ”‚ โ”‚ โ”‚ data = self.dataset[possibly_batched_index] โ”‚ โ”‚ 54 โ”‚ โ”‚ return self.collatefn(data) โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:550 in getitem โ”‚ โ”‚ โ”‚ โ”‚ 547 โ”‚ โ”‚ โ”‚ 548 โ”‚ def getitem(self, index): โ”‚ โ”‚ 549 โ”‚ โ”‚ โ”‚ โ”‚ โฑ 550 โ”‚ โ”‚ video, = self.process_video_wrapper(self.video_files[index]) โ”‚ โ”‚ 551 โ”‚ โ”‚ โ”‚ โ”‚ 552 โ”‚ โ”‚ if os.path.exists(self.video_files[index].replace(".mp4", ".tx โ”‚ โ”‚ 553 โ”‚ โ”‚ โ”‚ with open(self.video_files[index].replace(".mp4", ".txt"), โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:523 in โ”‚ โ”‚ process_video_wrapper โ”‚ โ”‚ โ”‚ โ”‚ 520 โ”‚ โ”‚ return video, vr โ”‚ โ”‚ 521 โ”‚ โ”‚ โ”‚ 522 โ”‚ def process_video_wrapper(self, vid_path): โ”‚ โ”‚ โฑ 523 โ”‚ โ”‚ video, vr = process_video( โ”‚ โ”‚ 524 โ”‚ โ”‚ โ”‚ โ”‚ vid_path, โ”‚ โ”‚ 525 โ”‚ โ”‚ โ”‚ โ”‚ self.use_bucketing, โ”‚ โ”‚ 526 โ”‚ โ”‚ โ”‚ โ”‚ self.width, โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:80 in process_video โ”‚ โ”‚ โ”‚ โ”‚ 77 โ”‚ if use_bucketing: โ”‚ โ”‚ 78 โ”‚ โ”‚ vr = decord.VideoReader(vid_path) โ”‚ โ”‚ 79 โ”‚ โ”‚ resize = get_frame_buckets(vr) โ”‚ โ”‚ โฑ 80 โ”‚ โ”‚ video = get_frame_batch(vr, resize=resize) โ”‚ โ”‚ 81 โ”‚ โ”‚ โ”‚ 82 โ”‚ else: โ”‚ โ”‚ 83 โ”‚ โ”‚ vr = decord.VideoReader(vid_path, width=w, height=h) โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:507 in get_frame_batch โ”‚ โ”‚ โ”‚ โ”‚ 504 โ”‚ โ”‚ native_fps = vr.get_avg_fps() โ”‚ โ”‚ 505 โ”‚ โ”‚ every_nth_frame = round(native_fps / self.fps) โ”‚ โ”‚ 506 โ”‚ โ”‚ โ”‚ โ”‚ โฑ 507 โ”‚ โ”‚ effective_length = len(vr) // every_nth_frame โ”‚ โ”‚ 508 โ”‚ โ”‚ โ”‚ โ”‚ 509 โ”‚ โ”‚ if effective_length < self.n_sample_frames: โ”‚ โ”‚ 510 โ”‚ โ”‚ โ”‚ return self.getitem(random.randint(0, len(self.video_f โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ ZeroDivisionError: integer division or modulo by zero

ExponentialML commented 1 year ago

@ImBadAtNames2019 I pushed another fix. Try again please.

I apologize for the inconvenience as I'm able to test at the moment, but the following fix should work.

ImBadAtNames2019 commented 1 year ago

@ImBadAtNames2019 I pushed another fix. Try again please.

I apologize for the inconvenience as I'm able to test at the moment, but the following fix should work.

No worries.

Testing right now.

ImBadAtNames2019 commented 1 year ago

@ImBadAtNames2019 I pushed another fix. Try again please.

I apologize for the inconvenience as I'm able to test at the moment, but the following fix should work.

Nope, i even tried changing videos and i still get this errors. I get 2 different errors, sometimes the one i showed you above, and sometimes this one:

2023-04-09 20:49:11.225804: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.9/dist-packages/accelerate/accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of ๐Ÿค— Accelerate. Use project_dir instead. warnings.warn( 04/09/2023 20:49:13 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: fp16

{'variance_type'} was not found in config. Values will be initialized to default values. /usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(checkpoint_file, framework="pt") as f: /usr/local/lib/python3.9/dist-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() /usr/local/lib/python3.9/dist-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = cls(wrap_storage=untyped_storage) /usr/local/lib/python3.9/dist-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(filename, framework="pt", device=device) as f: {'downsample_padding', 'mid_block_scale_factor'} was not found in config. Values will be initialized to default values. 33 Attention layers using Scaled Dot Product Attention. Lora successfully injected into UNet3DConditionModel. Lora successfully injected into CLIPTextModel. Caching Latents.: 0% 0/2 [00:00<?, ?it/s] โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ /content/Text-To-Video-Finetuning/train.py:914 in โ”‚ โ”‚ โ”‚ โ”‚ 911 โ”‚ parser.add_argument("--config", type=str, default="./configs/my_co โ”‚ โ”‚ 912 โ”‚ args = parser.parse_args() โ”‚ โ”‚ 913 โ”‚ โ”‚ โ”‚ โฑ 914 โ”‚ main(**OmegaConf.load(args.config)) โ”‚ โ”‚ 915 โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/train.py:603 in main โ”‚ โ”‚ โ”‚ โ”‚ 600 โ”‚ ) โ”‚ โ”‚ 601 โ”‚ โ”‚ โ”‚ 602 โ”‚ # Latents caching โ”‚ โ”‚ โฑ 603 โ”‚ cached_data_loader = handle_cache_latents( โ”‚ โ”‚ 604 โ”‚ โ”‚ cache_latents, โ”‚ โ”‚ 605 โ”‚ โ”‚ output_dir, โ”‚ โ”‚ 606 โ”‚ โ”‚ train_dataloader, โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/train.py:332 in handle_cache_latents โ”‚ โ”‚ โ”‚ โ”‚ 329 โ”‚ โ”‚ cache_save_dir = f"{output_dir}/cached_latents" โ”‚ โ”‚ 330 โ”‚ โ”‚ os.makedirs(cache_save_dir, exist_ok=True) โ”‚ โ”‚ 331 โ”‚ โ”‚ โ”‚ โ”‚ โฑ 332 โ”‚ โ”‚ for i, batch in enumerate(tqdm(train_dataloader, desc="Caching โ”‚ โ”‚ 333 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ 334 โ”‚ โ”‚ โ”‚ savename = f"cached{i}" โ”‚ โ”‚ 335 โ”‚ โ”‚ โ”‚ full_out_path = f"{cache_save_dir}/{save_name}.pt" โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/tqdm/std.py:1178 in iter โ”‚ โ”‚ โ”‚ โ”‚ 1175 โ”‚ โ”‚ time = self._time โ”‚ โ”‚ 1176 โ”‚ โ”‚ โ”‚ โ”‚ 1177 โ”‚ โ”‚ try: โ”‚ โ”‚ โฑ 1178 โ”‚ โ”‚ โ”‚ for obj in iterable: โ”‚ โ”‚ 1179 โ”‚ โ”‚ โ”‚ โ”‚ yield obj โ”‚ โ”‚ 1180 โ”‚ โ”‚ โ”‚ โ”‚ # Update and possibly print the progressbar. โ”‚ โ”‚ 1181 โ”‚ โ”‚ โ”‚ โ”‚ # Note: does not call self.update(1) for speed optimi โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:634 in โ”‚ โ”‚ next โ”‚ โ”‚ โ”‚ โ”‚ 631 โ”‚ โ”‚ โ”‚ if self._sampler_iter is None: โ”‚ โ”‚ 632 โ”‚ โ”‚ โ”‚ โ”‚ # TODO(https://github.com/pytorch/pytorch/issues/7675 โ”‚ โ”‚ 633 โ”‚ โ”‚ โ”‚ โ”‚ self._reset() # type: ignore[call-arg] โ”‚ โ”‚ โฑ 634 โ”‚ โ”‚ โ”‚ data = self._next_data() โ”‚ โ”‚ 635 โ”‚ โ”‚ โ”‚ self._num_yielded += 1 โ”‚ โ”‚ 636 โ”‚ โ”‚ โ”‚ if self._dataset_kind == _DatasetKind.Iterable and \ โ”‚ โ”‚ 637 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ self._IterableDataset_len_called is not None and โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:678 in โ”‚ โ”‚ _next_data โ”‚ โ”‚ โ”‚ โ”‚ 675 โ”‚ โ”‚ โ”‚ 676 โ”‚ def _next_data(self): โ”‚ โ”‚ 677 โ”‚ โ”‚ index = self._next_index() # may raise StopIteration โ”‚ โ”‚ โฑ 678 โ”‚ โ”‚ data = self._dataset_fetcher.fetch(index) # may raise StopIt โ”‚ โ”‚ 679 โ”‚ โ”‚ if self._pin_memory: โ”‚ โ”‚ 680 โ”‚ โ”‚ โ”‚ data = _utils.pin_memory.pin_memory(data, self._pin_memor โ”‚ โ”‚ 681 โ”‚ โ”‚ return data โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/utils/fetch.py:51 โ”‚ โ”‚ in fetch โ”‚ โ”‚ โ”‚ โ”‚ 48 โ”‚ โ”‚ โ”‚ if hasattr(self.dataset, "getitems") and self.dataset. โ”‚ โ”‚ 49 โ”‚ โ”‚ โ”‚ โ”‚ data = self.dataset.getitems(possibly_batched_index โ”‚ โ”‚ 50 โ”‚ โ”‚ โ”‚ else: โ”‚ โ”‚ โฑ 51 โ”‚ โ”‚ โ”‚ โ”‚ data = [self.dataset[idx] for idx in possibly_batched_i โ”‚ โ”‚ 52 โ”‚ โ”‚ else: โ”‚ โ”‚ 53 โ”‚ โ”‚ โ”‚ data = self.dataset[possibly_batched_index] โ”‚ โ”‚ 54 โ”‚ โ”‚ return self.collate_fn(data) โ”‚ โ”‚ โ”‚ โ”‚ /usr/local/lib/python3.9/dist-packages/torch/utils/data/utils/fetch.py:51 โ”‚ โ”‚ in โ”‚ โ”‚ โ”‚ โ”‚ 48 โ”‚ โ”‚ โ”‚ if hasattr(self.dataset, "getitems") and self.dataset. โ”‚ โ”‚ 49 โ”‚ โ”‚ โ”‚ โ”‚ data = self.dataset.getitems(possibly_batched_index โ”‚ โ”‚ 50 โ”‚ โ”‚ โ”‚ else: โ”‚ โ”‚ โฑ 51 โ”‚ โ”‚ โ”‚ โ”‚ data = [self.dataset[idx] for idx in possibly_batched_i โ”‚ โ”‚ 52 โ”‚ โ”‚ else: โ”‚ โ”‚ 53 โ”‚ โ”‚ โ”‚ data = self.dataset[possibly_batched_index] โ”‚ โ”‚ 54 โ”‚ โ”‚ return self.collate_fn(data) โ”‚ โ”‚ โ”‚ โ”‚ /content/Text-To-Video-Finetuning/utils/dataset.py:560 in getitem โ”‚ โ”‚ โ”‚ โ”‚ 557 โ”‚ โ”‚ โ”‚ โ”‚ 558 โ”‚ โ”‚ prompt_ids = self.get_prompt_ids(prompt) โ”‚ โ”‚ 559 โ”‚ โ”‚ โ”‚ โ”‚ โฑ 560 โ”‚ โ”‚ return {"pixel_values": (video / 127.5 - 1.0), "prompt_ids": p โ”‚ โ”‚ 561 โ”‚ โ”‚ 562 class CachedDataset(Dataset): โ”‚ โ”‚ 563 โ”‚ def init(self,cache_dir: str = ''): โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ TypeError: unsupported operand type(s) for /: 'tuple' and 'float'

ImBadAtNames2019 commented 1 year ago

Maybe because im running it on google colab?

bruefire commented 1 year ago

I'm not using Colab, but I encountered the following same(?) error: ZeroDivisionError: integer division or modulo by zero

In my case, I removed 'clip_path' items from the JSON file generated after preprocessing, and this allowed me to start the training successfully. I haven't finished the training yet, but it has progressed up to 1500 steps.

ExponentialML commented 1 year ago

@ImBadAtNames2019 Should be fixed now.

@bruefire Could you please post the error log if possible?

bruefire commented 1 year ago

@ExponentialML No problem. Here is the log (sorry for the ugly path): (venv) (base) PS E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning> python train.py --config .\configs\v2\my_train_config.yaml E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\venv\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: [WinError 127] ๆŒ‡ๅฎšใ•ใ‚ŒใŸใƒ—ใƒญใ‚ทใƒผใ‚ธใƒฃใŒ่ฆ‹ใคใ‹ใ‚Šใพใ›ใ‚“ใ€‚ warn(f"Failed to load image Python extension: {e}") WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 1.13.1+cu117 with CUDA 1107 (you have 2.1.0.dev20230409+cu117) Python 3.9.13 (you have 3.9.13) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\venv\lib\site-packages\accelerate\accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of ๐Ÿค— Accelerate. Use project_dir instead. warnings.warn( E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\venv\lib\site-packages\accelerate\accelerator.py:359: UserWarning: log_with=tensorboard was passed but no supported trackers are currently installed. warnings.warn(f"log_with={log_with} was passed but no supported trackers are currently installed.") 04/10/2023 07:02:22 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

Mixed precision type: fp16

{'variance_type'} was not found in config. Values will be initialized to default values. E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\venv\lib\site-packages\transformers\modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(checkpoint_file, framework="pt") as f: {'downsample_padding', 'mid_block_scale_factor'} was not found in config. Values will be initialized to default values. 33 Attention layers using Scaled Dot Product Attention. Lora successfully injected into UNet3DConditionModel. Lora successfully injected into CLIPTextModel. Loading JSON from ./json/anime-v2.json Caching Latents.: 1%|โ–Œ | 40/4064 [00:07<11:48, 5.68it/s] Traceback (most recent call last): File "E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\train.py", line 914, in main(OmegaConf.load(args.config)) File "E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\train.py", line 603, in main cached_data_loader = handle_cache_latents( File "E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\train.py", line 338, in handle_cache_latents batch['pixel_values'] = tensor_to_vae_latent(pixel_values, vae) File "E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\train.py", line 385, in tensor_to_vae_latent latents = rearrange(latents, "(b f) c h w -> b c f h w", f=video_length) File "E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\venv\lib\site-packages\einops\einops.py", line 483, in rearrange return reduce(cast(Tensor, tensor), pattern, reduction='rearrange', axes_lengths) File "E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\venv\lib\site-packages\einops\einops.py", line 412, in reduce return _apply_recipe(recipe, tensor, reduction_type=reduction) File "E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\venv\lib\site-packages\einops\einops.py", line 235, in _apply_recipe _reconstruct_from_shape(recipe, backend.shape(tensor)) File "E:\userdata\Documents\program\project\github\Text-To-Video-Finetuning\venv\lib\site-packages\einops\einops.py", line 199, in _reconstruct_from_shape_uncached if isinstance(length, int) and isinstance(known_product, int) and length % known_product != 0: ZeroDivisionError: integer division or modulo by zero

ExponentialML commented 1 year ago

@bruefire Interesting. Could you check to see if that specific video file is corrupt or plays at all? It seems everything goes well up until the 40th clip. If it is, I can implement some checks to ensure we can get past corrupt videos.

bruefire commented 1 year ago

@ExponentialML Ok. but I have work to do, so I'll check it once I get back.

ImBadAtNames2019 commented 1 year ago

@bruefire Interesting. Could you check to see if that specific video file is corrupt or plays at all? It seems everything goes well up until the 40th clip. If it is, I can implement some checks to ensure we can get past corrupt videos.

Sorry i went off to sleep. I tested it again and got the same error (ZeroDivisionError: integer division or modulo by zero), but then i tried changing the video folder dataset for the second time, and this time it worked. So the problem now seems to be my dataset, but its the exact same dataset that i used in the previous version, and it worked. Im checking which specific videos are causing the problem.

ImBadAtNames2019 commented 1 year ago

I have no idea why the videos in my dataset are causing this problem, all of them are. I even tried processing the videos that work in handbrake and davinci (just like i did with the videos of my dataset that are causing this problem) and everything works just fine. I dont know, i will rebuild my dataset again from zero and lets see what happens.

ImBadAtNames2019 commented 1 year ago

Ok i kinda figured it out. My dataset is made of short gifs, mp4 format, 10 fps, some of them dont even last a second. Decreasing the fps value to 10 and setting n_sample_frames to 2 in the config file fixed the issue for me. But why do i have to set it so low? if i set it higher than 2 i get the same error. How is it sampling frames? is it sampling 1 frame every 10? or is it sampling 2 frames one after another?

ImBadAtNames2019 commented 1 year ago

I think the problem was caused because i set it to sample more frames than there actually are, but i didnt get this error in the previous version.

ImBadAtNames2019 commented 1 year ago

Im trying to sample more than 2 frames but it just wont let me. If i set the fps lower than 10 or the n_sample_frames higher than 2 i get this error: RecursionError: maximum recursion depth exceeded in comparison

Im losing my mind.

ImBadAtNames2019 commented 1 year ago

God its finally working, i just had to loop each video from the dataset till it reached 2 seconds length, then i set the fps to 10 and n sample frames to 8.

ImBadAtNames2019 commented 1 year ago

Nope, there is something wrong the way its sampling the frames, this is whats causing the problem. The movement of the output is completely wrong.

bruefire commented 1 year ago

@bruefire Interesting. Could you check to see if that specific video file is corrupt or plays at all? It seems everything goes well up until the 40th clip. If it is, I can implement some checks to ensure we can get past corrupt videos.

I checked, and it seems that there are no damaged files, including the clipped videos. but I noticed that an error occurs when there is a 'data.frame_index=n-1' item with 'num_frames=n' in the JSON.

JCBrouwer commented 1 year ago

@bruefire @ImBadAtNames2019 sorry, I think this is happening due to some assumptions in the VideoFolder dataset.

I've made it throw a more clear error if the videos are too short and also guarded against dividing by zero for low frame-rate videos.

I'm not quite sure it's solved both of your issues (as it seems like decord's VideoReader might be incorrectly reading the frame-rate of short GIFs?), but could you give this pull request a try and share your results?

ImBadAtNames2019 commented 1 year ago

@bruefire @ImBadAtNames2019 sorry, I think this is happening due to some assumptions in the VideoFolder dataset.

I've made it throw a more clear error if the videos are too short and also guarded against dividing by zero for low frame-rate videos.

I'm not quite sure it's solved both of your issues (as it seems like decord's VideoReader might be incorrectly reading the frame-rate of short GIFs?), but could you give this pull request a try and share your results?

I have no idea how to use a pull request. Can i just replace the lines of code modified in the "files changed" tab?

JCBrouwer commented 1 year ago

Yeah you can just replace those lines or use git:

git fetch origin pull/49/head:videofolder-fix
git checkout videofolder-fix
bruefire commented 1 year ago

@JCBrouwer Thank you, but it didn't work well for me. I encountered the same ZeroDivisionError.

ImBadAtNames2019 commented 1 year ago

Yeah you can just replace those lines or use git:

git fetch origin pull/49/head:videofolder-fix
git checkout videofolder-fix

Yes now its not giving me any error, but im still not sure if its sampling the frames correctly. For example, if the video is 10fps, and i set the fps value in the config file to 10, and the n_sample_frames to 4, its going to sample 4 frames in order, one after another (without skipping any frame) from a random part of the video? And if i do the same thing again but this time i set the fps value to 5, its going to still sample 4 frames, but this time skipping one frame in between every time, like one frame yes and one frame no, one frame yes and one frame no. Did i get this right?

JCBrouwer commented 1 year ago

@bruefire ahh you're using the JSON dataset, the fix won't affect that. Seems to me that you're somehow loading a video in that has a length of 0.

@ImBadAtNames2019, yes your description is what the video loader should be doing. The fact that you were getting this error, though, makes me a little suspicious:

 โ”‚ 504 โ”‚ โ”‚ native_fps = vr.get_avg_fps() โ”‚
โ”‚ 505 โ”‚ โ”‚ every_nth_frame = round(native_fps / self.fps) โ”‚
โ”‚ 506 โ”‚ โ”‚ โ”‚
โ”‚ โฑ 507 โ”‚ โ”‚ effective_length = len(vr) // every_nth_frame โ”‚
โ”‚ 508 โ”‚ โ”‚ โ”‚
โ”‚ 509 โ”‚ โ”‚ if effective_length < self.n_sample_frames: โ”‚
โ”‚ 510 โ”‚ โ”‚ โ”‚ return self.getitem(random.randint(0, len(self.video_f โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
ZeroDivisionError: integer division or modulo by zero

Were you trying with an fps config that was higher than your 10 fps videos? Otherwise I think maybe vr.get_avg_fps() might have been returning a wrong value.

ImBadAtNames2019 commented 1 year ago

In the beginning yes i had the fps set to the default value 24, and the videos from the dataset were at 10 fps. But then i changed the fps value to 10 and i was still getting problems, it wasnt letting me sample more than 2 frames. So then i increased the length of the videos to 2 seconds by looping them (they were gifs originally), after that i was able to sample 12 frames (more than that would give me errors), but the movement of the video output (after finetuning) was completely wrong. I wrote above everything that happened. Now im at 72% progress with the updated script, using 10fps and 8 n_sample_frames, lets see if i get better results.

ImBadAtNames2019 commented 1 year ago

I tested the fine tuned model and there is no difference compared to the stock one, like it didnt fine tune it at all. Here is my config file below, i dont know what im doing wrong, i didnt have this problems with the previous version. My dataset is a folder containing 29 mp4 videos (720px720p resolution, 10fps, 1-3 seconds length, not less than 1 second), each video has its own txt file (named like the video, same folder) containing the prompt. Im using a 40gb nvidia a100 rented on google colab. @JCBrouwer

# Pretrained diffusers model path.
pretrained_model_path: "/content/drive/MyDrive/models/model_scope_diffusers" #https://huggingface.co/damo-vilab/text-to-video-ms-1.7b/tree/main

# The folder where your training outputs will be placed.
output_dir: "./outputs"

# You can train multiple datasets at once. They will be joined together for training.
# Simply remove the line you don't need, or keep them all for mixed training.

# 'image': A folder of images and captions (.txt)
# 'folder': A folder a videos and captions (.txt)
# 'json': The JSON file created with automatic BLIP2 captions using https://github.com/ExponentialML/Video-BLIP2-Preprocessor
# 'single_video': A single video file.mp4 and text prompt
dataset_types: 
  - 'folder'

# Adds offset noise to training. See https://www.crosslabs.org/blog/diffusion-with-offset-noise
offset_noise_strength: 0.1
use_offset_noise: False

# When True, this extends all items in all enabled datasets to the highest length. 
# For example, if you have 200 videos and 10 images, 10 images will be duplicated to the length of 200. 
extend_dataset: False

# Caches the latents (Frames-Image -> VAE -> Latent) to a HDD or SDD. 
# The latents will be saved under your training folder, and loaded automatically for training.
# This both saves memory and speeds up training and takes very little disk space.
cache_latents: True

# If you have cached latents set to `True` and have a directory of cached latents,
# you can skip the caching process and load previously saved ones. 
cached_latent_dir: null #/path/to/cached_latents

# Train the text encoder. Leave at false to use LoRA only (Recommended).
train_text_encoder: False

# https://github.com/cloneofsimo/lora
# Use LoRA to train extra layers whilst saving memory. It trains both a LoRA & the model itself.
# This works slightly different than vanilla LoRA and DOES NOT save a separate file.
# It is simply used as a mechanism for saving memory by keeping layers frozen and training the residual.

# Use LoRA for the UNET model.
use_unet_lora: True

# Use LoRA for the Text Encoder.
use_text_lora: True

# The modules to use for LoRA. Different from 'trainable_modules'.
unet_lora_modules:
  - "ResnetBlock2D"

# The modules to use for LoRA. Different from `trainable_text_modules`.
text_encoder_lora_modules:
  - "CLIPEncoderLayer"

# The rank for LoRA training. With ModelScope, the maximum should be 1024. 
# VRAM increases with higher rank, lower when decreased.
lora_rank: 16

# Training data parameters
train_data:

  # The width and height in which you want your training data to be resized to.
  width: 384      
  height: 384

  # This will find the closest aspect ratio to your input width and height. 
  # For example, 512x512 width and height with a video of resolution 1280x720 will be resized to 512x256
  use_bucketing: True

  # The start frame index where your videos should start (Leave this at one for json and folder based training).
  sample_start_idx: 1

  # Used for 'folder'. The rate at which your frames are sampled. Does nothing for 'json' and 'single_video' dataset.
  fps: 10

  # For 'single_video' and 'json'. The number of frames to "step" (1,2,3,4) (frame_step=2) -> (1,3,5,7, ...).  
  frame_step: 5

  # The number of frames to sample. The higher this number, the higher the VRAM (acts similar to batch size).
  n_sample_frames: 8

  # 'single_video'
  single_video_path: "path/to/single/video.mp4"

  # The prompt when using a a single video file
  single_video_prompt: ""

  # Fallback prompt if caption cannot be read. Enabled for 'image' and 'folder'.
  fallback_prompt: ''

  # 'folder'
  path: "/content/drive/MyDrive/Datasets/dataset_1"

  # 'json'
  json_path: 'path/to/train/json/'

  # 'image'
  image_dir: 'path/to/image/directory'

  # The prompt for all image files. Leave blank to use caption files (.txt) 
  single_img_prompt: ""

# Validation data parameters.
validation_data:

  # A custom prompt that is different from your training dataset. 
  prompt: "anime girl dancing"

  # Whether or not to sample preview during training (Requires more VRAM).
  sample_preview: True

  # The number of frames to sample during validation.
  num_frames: 16

  # Height and width of validation sample.
  width: 384
  height: 384

  # Number of inference steps when generating the video.
  num_inference_steps: 25

  # CFG scale
  guidance_scale: 9

# Learning rate for AdamW
learning_rate: 5e-6

# Weight decay. Higher = more regularization. Lower = closer to dataset.
adam_weight_decay: 1e-2

# Optimizer parameters for the UNET. Overrides base learning rate parameters.
extra_unet_params: null
  #learning_rate: 1e-5
  #adam_weight_decay: 1e-4

# Optimizer parameters for the Text Encoder. Overrides base learning rate parameters.
extra_text_encoder_params: null
  #learning_rate: 5e-6
  #adam_weight_decay: 0.2

# How many batches to train. Not to be confused with video frames.
train_batch_size: 1

# Maximum number of train steps. Model is saved after training.
max_train_steps: 2500

# Saves a model every nth step.
checkpointing_steps: 25000

# How many steps to do for validation if sample_preview is enabled.
validation_steps: 100

# Which modules we want to unfreeze for the UNET. Advanced usage.
trainable_modules:

  # If you want to ignore temporal attention entirely, remove "attn1-2" and replace with ".attentions"
  # This is for self attetion. Activates for spatial and temporal dimensions if n_sample_frames > 1
  - "attn1"

  # This is for cross attention (image & text data). Activates for spatial and temporal dimensions if n_sample_frames > 1
  - "attn2"

  #  Convolution networks that hold temporal information. Activates for spatial and temporal dimensions if n_sample_frames > 1
  - 'temp_conv'

# Which modules we want to unfreeze for the Text Encoder. Advanced usage.
trainable_text_modules:
  - "all"

# Seed for validation.
seed: 64

# Whether or not we want to use mixed precision with accelerate
mixed_precision: "fp16"

# This seems to be incompatible at the moment.
use_8bit_adam: False 

# Trades VRAM usage for speed. You lose roughly 20% of training speed, but save a lot of VRAM.
# If you need to save more VRAM, it can also be enabled for the text encoder, but reduces speed x2.
gradient_checkpointing: False
text_encoder_gradient_checkpointing: False

# Xformers must be installed for best memory savings and performance (< Pytorch 2.0)
enable_xformers_memory_efficient_attention: False

# Use scaled dot product attention (Only available with >= Torch 2.0)
enable_torch_2_attn: True
JCBrouwer commented 1 year ago

Ok @ImBadAtNames2019 I think the issues you were running into earlier should now more clearly fail with an error about the video files being to short. Judging by when you run into errors I'd hazard a guess that your shortest video is about 1.2 seconds long.

Regarding fine-tuning not being very effective, I'd suggest trying to raise your learning rate and training for longer than 2500 steps. For me a learning rate of 1e-5 and weight_decay of 0 starts to give clearly tuned results after ~5000 steps.

JCBrouwer commented 1 year ago

Regarding the error you're running into @bruefire, it's probably going wrong in this function when the BLIP2 frame is too close to the end of the video to sample a full n_sample_frames at the frame_step.

If so, any idea what a good fix would be @ExponentialML ?

ImBadAtNames2019 commented 1 year ago

Ok @ImBadAtNames2019 I think the issues you were running into earlier should now more clearly fail with an error about the video files being to short. Judging by when you run into errors I'd hazard a guess that your shortest video is about 1.2 seconds long.

Regarding fine-tuning not being very effective, I'd suggest trying to raise your learning rate and training for longer than 2500 steps. For me a learning rate of 1e-5 and weight_decay of 0 starts to give clearly tuned results after ~5000 steps.

I will try finetuning it with 5k steps but i doubt it will make any difference. The output of the model finetuned with 2500 steps is identical to the one of the stock model, are you sure my config file above is ok? Maybe i didnt configure it properly. In the previous version the output was completely different even with less than 2500 steps.

ExponentialML commented 1 year ago

Regarding the error you're running into @bruefire, it's probably going wrong in this function when the BLIP2 frame is too close to the end of the video to sample a full n_sample_frames at the frame_step.

If so, any idea what a good fix would be @ExponentialML ?

It's tricky, but my recommendation would be to just return 1 frame. That way it will be trained on both the text encoder and attention layers, and the full frame videos will go to the temporal dimension. If all else fails, skip into the next batch or grab a fallback frame when the dataset is instantiated.

ImBadAtNames2019 commented 1 year ago

AttributeError: 'DDPMScheduler' object has no attribute 'prediction_type' Steps: 0% 0/10000 [00:00<?, ?it/s]

Thats it i give up, this new version is making me want to jump off the balcony. I will just wait for the VideoCrafter implementation.