Nightmare-n / DepthAnyVideo

Depth Any Video with Scalable Synthetic Data
https://depthanyvideo.github.io
Apache License 2.0
408 stars 27 forks source link

Added gitignore, moved examples to assets, made initial changes for r… #8

Closed maximilian-vH closed 1 month ago

maximilian-vH commented 1 month ago

Hey,

I noticed there where some issues with the original repository.

I am not sure if you are aware of them and I started making some changes to get it to work.

My current next issue is the following error:

Traceback (most recent call last):
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/gradio/utils.py", line 826, in wrapper
    response = f(*args, **kwargs)
  File "/Users/maximilianvh/everything/resembler/web_code/DepthAnyVideo/app.py", line 121, in depth_any_video
    pipe_out = pipe(
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/maximilianvh/everything/resembler/web_code/DepthAnyVideo/dav/pipelines/dav_pipeline.py", line 165, in __call__
    key_depth_latent = self.single_infer(
  File "/Users/maximilianvh/everything/resembler/web_code/DepthAnyVideo/dav/pipelines/dav_pipeline.py", line 69, in single_infer
    rgb_latent = self.encode(rgb)
  File "/Users/maximilianvh/everything/resembler/web_code/DepthAnyVideo/dav/pipelines/dav_pipeline.py", line 42, in encode
    latent = self.vae.encode(input.to(self.vae.dtype)).latent_dist.mode()
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/diffusers/models/autoencoders/autoencoder_kl_temporal_decoder.py", line 334, in encode
    h = self.encoder(x)
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/diffusers/models/autoencoders/vae.py", line 143, in forward
    sample = self.conv_in(sample)
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 458, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/maximilianvh/.pyenvs/dav/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [26, 1, 3, 352, 640]
nekoshadow1 commented 1 month ago

Yes, the current repo is too buggy to run. I have only made the per-image prediction to work so far (after fixing countless issues...). But the prediction results cannot maintain temporal consistency. I'm still figuring out how to run the code on an entire video.

Nightmare-n commented 1 month ago

Thanks for your interest. The released code is a draft; please be patient as we will thoroughly review it today.

Nightmare-n commented 1 month ago

We have already provided the instructions in the repository. Please give it a try, and let us know if you encounter any issues.