Render video from one cli command directly from image and depth map

C00reNUT commented 8 months ago

Hello, really nice little tool, would it be possible to directly render the video using image and depth map directly from cli?

Tremeschin commented 8 months ago

Yes!

broken depthflow parallax --image /path/to/background.jpg --depth /path/to/depth.jpg main -r -o ./output.mp4

or just -i image.ext -d depth.ext

You can see other rendering options with depthflow --help

I recommend using -q ultra --ssaa 1.5 for a better quality, beware it's heavy on GPU

One minor inconvenience is that, for now, the output video doesn't follow the input image's one, gotta input the target resolution as well -w 1080 -h 1920 for a 9:16 shorts, for example

C00reNUT commented 8 months ago

very nice, thank you!

Now it renders the video, but I would like to achieve zoom effect or even better dolly zoom, now it just moves a bit from side to side...

C00reNUT commented 8 months ago

hmmm, interesting, I first tried to use recent marigold https://marigoldmonodepth.github.io/ depth estimation because it produces better depth maps than zoedepth, which you use in your repo.

But when I use marigold depth map almost no motion is used in the video, on contrary to when I don't insert the depth map and use zopdepth to calculate it.

Is there some reason it should be like this?

Tremeschin commented 8 months ago

You can change the animation parameters in the .update() method in the file DepthFlow/DepthFlow.py

There's a zoom out effect commented in the end of the function, you can change the dolly zoom formula in the top, as well as self.parallax_height = (number 0 to 1) for the intensity of the effect

There's not really a presets mechanism yet, for now graphing tools like Desmos is our best friend on defining the animations

C00reNUT commented 8 months ago

thank you for helping me out, it turns out that I used png image as input for depth, when converted into jpg it works better, but produces much more artifacts than zoedepth.

You can change the animation parameters in the .update() method in the file DepthFlow/DepthFlow.py

I will try it!

Just one last thing is there any chance to input more than one image/whole folder?

Tremeschin commented 8 months ago

Interesting this marigold link, will take a look, potentially being the default one, I like that it's with diffusers

My first guess of the why there was almost no motion is that the output wasn't maybe normalized to 0 and 1 or "wrong" data type being read?

When you open the image, is it black and white or does it have that color palette on their showcase? DepthFlow expects a black and white image

C00reNUT commented 8 months ago

Interesting this marigold link, will take a look, potentially being the default one, I like that it's with diffusers

It's very GPU heavy, if I remember correctly more than 12GB for one 1280x720 image. But it's probably best depth estimation out there.

My first guess of the why there was almost no motion is that the output wasn't maybe normalized to 0 and 1 or "wrong" data type being read?

When you open the image, is it black and white or does it have that color palette on their showcase? DepthFlow expects a black and white image

I used BW depth estimation, but needed to convert png to jpg

Tremeschin commented 8 months ago

Just one last thing is there any chance to input more than one image/whole folder?

It's not that hard to automate it, I have done it in some proprietary projects, you could, for example, on DepthFlow's __main__.py file create your own class

class CustomDepthFlow(DepthFlowScene):
   def pipeline(self):
       ...

   # other functions

and then automate it with

for image in Path("/folder/of/images").glob("*"):

    # Skip non images
    if not (image.suffix in (".jpg", ".png")):
        continue

    # Create scene and export to video
    depthflow = CustomDepthFlow()
    depthflow.parallax(image=image)
    width, height = PIL.Image.open(image).size
    depthflow.main(
        ssaa=1.25,
        width=width,
        height=height,
        quality="ultra",
        output=image.with_suffix(".mp4"),
    )

(Ideally on some other file and import the DepthFlow's package like I did, hacky solution above)

Tremeschin commented 8 months ago

12GB for one 1280x720 image.

Ouch 😅, maybe half precision could help, thanks for letting me know of its existence btw

C00reNUT commented 8 months ago

Maybe one last thing - when I run broken depthflow parallax --image keywords_2.jpg --depth keywords_2_pred.jpg main -r -o /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/Video/DepthFlow/output.mp4 -w 1280 -h 768 does it use CUDA by default when I installed the repo using broken depthflow poe cuda?

C00reNUT commented 8 months ago

12GB for one 1280x720 image.

Ouch 😅, maybe half precision could help, thanks for letting me know of its existence btw

Sure, another recent one that is far superior according to metrics is https://zhyever.github.io/patchfusion/ but I haven't tried it

Tremeschin commented 8 months ago

does it use CUDA by default when I installed the repo using broken depthflow poe cuda?

Yes, whenever CUDA is available as seen in this line it gets used automatically

First time running broken depthflow it defaults to CPU mode (I wonder if I should make CUDA the default one?), then running broken depthflow poe cuda once uninstalls PyTorch CPU and installs the CUDA on the virtual environment (hacky way, torch packaging is really painful, same package name, multiple "versions")

C00reNUT commented 8 months ago

does it use CUDA by default when I installed the repo using broken depthflow poe cuda?

Yes, whenever CUDA is available as seen in this line it gets used automatically

First time running broken depthflow it defaults to CPU mode (I wonder if I should make CUDA the default one?), then running broken depthflow poe cuda once uninstalls PyTorch CPU and installs the CUDA on the virtual environment (hacky way, torch packaging is really painful, same package name, multiple "versions")

Great, well I prefer standard conda env file - install through yaml file, than using poetry for CUDA based projects... all the handling of torch cuda dependencies is much easier...

Thank you for this nice implemtation and help!

Tremeschin commented 8 months ago

12GB for one 1280x720 image.

Ouch 😅, maybe half precision could help, thanks for letting me know of its existence btw

Sure, another recent one that is far superior according to metrics is https://zhyever.github.io/patchfusion/ but I haven't tried it

Just tried it, and LOL here goes 11 GB of my 3060 for that default 4k mountains image and about a minute to process, nearly running out of memory, depth results follows

fe0a6b3435d86d8c295e1d95a76145ff

I don't know why it's grainy, I used full precision on torch.. probably image too big? who knows heh

Given the VRAM usage and time I think zoedepth is the most practical one to be the default, will look forward having options for alternatives depth estimations though

Tremeschin commented 8 months ago

Great, well I prefer standard conda env file - install through yaml file, than using poetry for CUDA based projects... all the handling of torch cuda dependencies is much easier...

Thanks for the feedback about getting the dependencies !

I don't know enough of conda to implement or know its benefits tbh, I remember I had more luck with it on Windows long ago (early Python 3 era, mostly NumPy), nowadays stuff just worked, and binaries exists on PyPI for most of blob packages

If there's demand for it or it proves useful enough I can try integrating it as an alternative, beware I'll need path dependencies for the monorepo, don't know if it's supported on the yaml spec file

Tremeschin commented 8 months ago

Just talking to myself of what I said earlier,

I recommend using -q ultra --ssaa 1.5 for a better quality, beware it's heavy on GPU

I hope with the new texture class rewrite I'll be able to do multi-passes and store previous computation of displacements of the parallax effect to be reused, potentially saving a lot of GPU computational power!

C00reNUT commented 8 months ago

        # In and out dolly zoom
        # self.parallax_dolly = 0.5*(1 + math.cos(self.time))
        # self.parallax_dolly = 0.9*(1 + math.cos(self.time))

        # # Infinite 8 loop shift
        # self.parallax_x = 0.1 * math.sin(  self.time)
        # self.parallax_y = 0.1 * math.sin(2*self.time)

        # # Oscillating rotation
        # self.camera.rotate(
        #     direction=self.camera.base_z,
        #     angle=math.cos(self.time)*self.dt*0.4
        # )

        # # Zoom out on the start
        # self.parallax_zoom = 0.6 + 0.4*(2/math.pi)*math.atan(3*self.time)

I have tried to comment all the code but the default movement path is still present... isn't the poetry copying the file to use it in env or something?

If I understand it correctly by changing these parameters, I shall be able to control the strength of zoom, etc and equations in update control the movement

    # Parallax parameters
    parallax_fixed     = field(default=True)
    parallax_height    = field(default=0.2)
    parallax_focus     = field(default=1.0)
    parallax_zoom      = field(default=0.5)
    parallax_isometric = field(default=0.0)
    parallax_dolly     = field(default=0.0)
    parallax_x         = field(default=0.0)
    parallax_y         = field(default=0.0)

C00reNUT commented 8 months ago

Just talking to myself of what I said earlier,

I recommend using -q ultra --ssaa 1.5 for a better quality, beware it's heavy on GPU

I hope with the new texture class rewrite I'll be able to do multi-passes and store previous computation of displacements of the parallax effect to be reused, potentially saving a lot of GPU computational power!

btw when I run broken depthflow parallax --image keywords_6.jpg --depth keywords_6_pred.jpg main -r -o /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/Video/DepthFlow/output.mp4 -w 1280 -h 768 -q ultra --ssaa 1.5 1280x768 image as input I see only 76MB in 'watch nvidia-smi' so it's not particullary GPU heavy

Tremeschin commented 8 months ago

Unlikely it is poetry messing with the files, I commented these lines on DepthFlow.py and get no movement at all on the builtin realtime window

If you're on the custom class, you should override update(self) (I wrote the wrong method on an earlier comment)

My best guess is the later is your case and probably the default implementation update was still there due inheritance, but can also be the wrong file path you're editing if you cloned twice (it happens sometime)

Yours __main__.py should look something like:

from DepthFlow import *

class CustomDepthFlow(DepthFlowScene):
    def update(self):
        # Dolly zoom example
        self.parallax_dolly = 10 * math.atan(self.time)
        ...

def main():
    depthflow = CustomDepthFlow()

    # The automation thing if run with BATCH=1 depthflow
    if os.environ.get("BATCH", "0") == "1":
        for image in Path("/folder/of/images").glob("*"):
            ...
        return

    # Realtime window or CLI
    with BrokenProfiler("DEPTHFLOW"):
        DEPTHFLOW.welcome()
        depthflow.cli(sys.argv[1:])

if __name__ == "__main__":
    main()

1280x768 image as input I see only 76MB in 'watch nvidia-smi' so it's not particullary GPU heavy

Yes, not heavy on VRAM, but it really hits on the pixel shaders circuitry (""rasterization""), when I'm rendering lots of 1440p or 4k images in parallel with some super sampling and really low displacement steps (high quality), the textures accesses per second easily hit the trillions. Somehow current shader is still enough to run at 4k60 on the "ultra" quality in real time on my RTX 3060 and 8k30 (not that the later is practical lol)

Tremeschin commented 8 months ago

It's not documented yet and a bit technical, yet to make some form of infographic, but as you'll be dealing with dolly zoom and the parameters, there's dolly, isometric and zoom parameters

Zoom multiplies the screen projection plane (center screen is 0, top screen is 1, bottom -1) by this amount ("How far can you see the scene"), so 0.5 is a quarter of it zoomed on the center
Dolly is a distance in units of how far "back" from the screen the ray projection starts of the camera (range 0 to infty), for a more "natural" indirect isometric factor specification
Isometric is a number that controls the size of the projection plane of the ray origin relative to the "screen" plane, which have sides equal to fov*isometric as the screen one is just fov, a number equal to 1 is equal to dolly=VeryBig

(I'm considering just having Dolly and make isometric set dolly=tan(pi*isometric/2) but that's a future thing and just writing out ideas)

Edit: Now I think of it, dolly might mess up on raymarching scenes, remembered why both coexist

C00reNUT commented 8 months ago

Unlikely it is poetry messing with the files, I commented these lines on DepthFlow.py and get no movement at all on the builtin realtime window

If you're on the custom class, you should override update(self) (I wrote the wrong method on an earlier comment)

My best guess is the later is your case and probably the default implementation update was still there due inheritance, but can also be the wrong file path you're editing if you cloned twice (it happens sometime)

Thank you it works like charm, I was editing the same file from the https://github.com/BrokenSource/DepthFlow depo I cloned and didn't realize that the wrapper downloads BrokenSource/Projects/DepthFlow with the reference python files... well, falling into the rutine without thinking, thank you once again it is very nice and quick!

BrokenSource / DepthFlow

Render video from one cli command directly from image and depth map #7