-
File "/export/scratch/ra63nev/lab/discretediffusion/OmniTokenizer/omnitokenizer.py", line 108, in __init__
spatial_depth=args.spatial_depth, temporal_depth=args.temporal_depth, causal_in_temporal…
-
1.(HMMR) Learning 3d human dynamics from video(2019)
temporal encoder: **1D temporal** convolutional layers, **precompute** the image features on each frame, get current and ±∆t frames prediction.
c…
-
I think this is a new issue with some recent update, as it was running fine before.
BTW, it's no problem with CogVideo Sampler.
Init Pyramid Attention Broadcast. steps: 50. spatial broadcast: True…
-
1.PARE: Part Attention Regressor for 3D Human Body Estimation(2021)
img-->volumetric features(before the global average pooling)-->part branch: estimates attention weights +feature branch: performs S…
-
I came across the Example-1: Bike Flow Prediction (Zero-shot scenario) in your paper, and I have some concerns regarding the classification of this task as “zero-shot.”
As I understand it, a zero-s…
-
1.TexturePose: Supervising Human Mesh Estimation with Texture Consistency(2019)
Texture map (texel): A corresponding UVmap un-warps the template surface onto an image, A, which is the texture map
co…
-
# Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
> Sora unveils the potential of scaling Diffusion Transformer (DiT) for gener…
-
Hello,
Thank you so much for your great work and codebase!
I would appreciate your clarifications on a few items.
1) From within ```TextToVideoSDPipelineCall.py```, at this [line](https://g…
-
Thank you for sharing the great work ! I have a question regarding the design choice of the VQVAE spatial encoder. Currently only the encoder has included the spatial transformer to encode the relati…
-
Thanks for the great work, also for releasing the training script `train_svd_lcm.py`.
I am trying to reproduce the results using the provided `train_svd_lcm.py`, but after half of the training (20,…