Demo has serious appearance leakage

Hi, Just cloned the repo and ran the demo: python train.py --config ./configs/config.yaml The result, which should be a skateboard with the motion of the asset car video, is a deformed car :(

Under "samples[1-100]" it still is a skateboard, but on "samples[150|200]" it becomes a deformed car. I would appreciate if you look into it, and perhaps supply with a quick tutorial covering how we should play with the hyper parameters to make it work better.

I will play with it a bit myself.

Running on ubuntu 20.04, with a miniconda env. "conda env export" yields the following:

name: motioninversion
channels:
  - xformers
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - blas=1.0=mkl
  - bzip2=1.0.8=h5eee18b_6
  - ca-certificates=2024.7.2=h06a4308_0
  - cffi=1.16.0=py310h5eee18b_1
  - cudatoolkit=11.3.1=h2bc3f7f_2
  - future=0.18.3=py310h06a4308_0
  - intel-openmp=2021.4.0=h06a4308_3561
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.4.4=h6a678d5_1
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libprotobuf=3.20.3=he621ea3_0
  - libstdcxx-ng=11.2.0=h1234567_1
  - libuuid=1.41.5=h5eee18b_0
  - mkl=2021.4.0=h06a4308_640
  - mkl-service=2.4.0=py310h7f8727e_0
  - mkl_fft=1.3.1=py310hd6ae3a3_0
  - mkl_random=1.2.2=py310h00e6091_0
  - ncurses=6.4=h6a678d5_0
  - ninja=1.10.2=h06a4308_5
  - ninja-base=1.10.2=hd09550d_5
  - numpy-base=1.24.3=py310h8e6c178_0
  - openssl=3.0.15=h5eee18b_0
  - pip=24.2=py310h06a4308_0
  - pycparser=2.21=pyhd3eb1b0_0
  - python=3.10.14=h955ad1f_1
  - readline=8.2=h5eee18b_0
  - setuptools=72.1.0=py310h06a4308_0
  - six=1.16.0=pyhd3eb1b0_1
  - sqlite=3.45.3=h5eee18b_0
  - tk=8.6.14=h39e8969_0
  - typing_extensions=4.11.0=py310h06a4308_0
  - tzdata=2024a=h04d1e81_0
  - wheel=0.43.0=py310h06a4308_0
  - xformers=0.0.22=py310_cu11.6.2_pyt1.12.1
  - xz=5.4.6=h5eee18b_1
  - yaml=0.2.5=h7b6447c_0
  - zlib=1.2.13=h5eee18b_1
  - pip:
    - accelerate==0.34.2
    - antlr4-python3-runtime==4.9.3
    - av==13.0.0
    - certifi==2024.8.30
    - charset-normalizer==3.3.2
    - decord==0.6.0
    - diffusers==0.26.3
    - einops==0.8.0
    - filelock==3.15.4
    - fsspec==2024.9.0
    - huggingface-hub==0.24.6
    - idna==3.8
    - imageio==2.35.1
    - imageio-ffmpeg==0.5.1
    - importlib-metadata==8.4.0
    - inquirerpy==0.3.4
    - jinja2==3.1.4
    - markupsafe==2.1.5
    - mpmath==1.3.0
    - networkx==3.3
    - numpy==2.1.1
    - nvidia-cublas-cu12==12.1.3.1
    - nvidia-cuda-cupti-cu12==12.1.105
    - nvidia-cuda-nvrtc-cu12==12.1.105
    - nvidia-cuda-runtime-cu12==12.1.105
    - nvidia-cudnn-cu12==8.9.2.26
    - nvidia-cufft-cu12==11.0.2.54
    - nvidia-curand-cu12==10.3.2.106
    - nvidia-cusolver-cu12==11.4.5.107
    - nvidia-cusparse-cu12==12.1.0.106
    - nvidia-nccl-cu12==2.19.3
    - nvidia-nvjitlink-cu12==12.6.68
    - nvidia-nvtx-cu12==12.1.105
    - omegaconf==2.3.0
    - opencv-python==4.10.0.84
    - packaging==24.1
    - pfzy==0.3.4
    - pillow==10.4.0
    - prompt-toolkit==3.0.47
    - psutil==6.0.0
    - pyyaml==6.0.2
    - regex==2024.7.24
    - requests==2.32.3
    - safetensors==0.4.5
    - sympy==1.13.2
    - tokenizers==0.13.3
    - torch==2.4.1
    - torchvision==0.17.0
    - tqdm==4.66.5
    - transformers==4.28.0
    - triton==2.2.0
    - typing-extensions==4.12.2
    - urllib3==2.2.2
    - wcwidth==0.2.13
    - zipp==3.20.1
prefix: /home/galhar/miniconda3/envs/motioninversion

EnVision-Research / MotionInversion

Demo has serious appearance leakage #5