VRAM info - Githubissues

C00reNUT commented 6 days ago

Small passage about VRAM info would be nice :)

nitinmukesh commented 6 days ago

Yeah. I also want to know how much VRAM required for inference.

i-amgeek commented 6 days ago

Same question. Would be good to know VRAM usage for various dimensions.

ivanstepanovftw commented 5 days ago

8 GiB is not enough :crying_cat_face:

x4080 commented 5 days ago

even 16GB is not enough

DsnTgr commented 5 days ago

even 24GB is not enough

joseph16388 commented 5 days ago

need a 8-bit version

WangRongsheng commented 5 days ago

reference it:

x4080 commented 4 days ago

Needs 32 GB at least ? Quant anyone ?

KT313 commented 3 days ago

I modified the inference script, i made it run with max usage of 15264 MiB of Vram (according to nvtop, inference done with resolution 512x768 and 100 frames). You may need to turn off anything else that uses vram if you're using a 16GiB gpu, but it should work.

i put the modified files here: https://github.com/KT313/LTX_Video_better_vram

it should work if you just drag and drop the files into your LTX-Video folder.

it works by basically offloading everything that is not needed in vram to cpu memory during each of the inference steps.

x4080 commented 3 days ago

@KT313 cool, I'll try your solution Edit : It works, will it need more VRAM if more frames generated ? Edit2 : It only works 1st time and then it shows error :

ValueError: Cannot generate a cpu tensor from a generator of type cuda.

Edit3 : Now it works again if using suggested resolution (previously I was testing at 384x672, works at 512x768 30 frames and repeated it, dont know why the error above though

Edit4: Error above appears again when using 60 frames, maybe OOM error then

KT313 commented 3 days ago

@x4080 i made some modifications here so the tensors should get generated on the generators device (cuda): https://github.com/KT313/LTX_Video_better_vram/tree/test I cannot test it currently though, let me know if that works better

and regarding your first edit: yes, since the size of the latent tensor (that basically contains the video) depends on the resolution (height x width x frames (+ a bit extra from padding)), increasing frames will make the tensor larger which will need more vram. But actually i think that compared to the vram needed for the unet model, the tensor itself is quite small so you might be able to increase the frames a bit without issues

MarcosRodrigoT commented 2 days ago

@x4080 i made some modifications here so the tensors should get generated on the generators device (cuda): https://github.com/KT313/LTX_Video_better_vram/tree/test I cannot test it currently though, let me know if that works better

and regarding your first edit: yes, since the size of the latent tensor (that basically contains the video) depends on the resolution (height x width x frames (+ a bit extra from padding)), increasing frames will make the tensor larger which will need more vram. But actually i think that compared to the vram needed for the unet model, the tensor itself is quite small so you might be able to increase the frames a bit without issues

First of all, thank you for implementing this so that it takes less VRAM. I have tried it out a couple of times (with resolution of 704x480 and for 257 frames) and it works like a charm using only around 16 GB of a 4090 GPU. However, it randomly throws the an error related to "cpu" and "cuda" tensors. Re-running the script usually works, so it is not a big deal.

This was the error:

Traceback (most recent call last):
  File "/home/mrt/Projects/LTX-Video/inference.py", line 452, in <module>
    main()
  File "/home/mrt/Projects/LTX-Video/inference.py", line 356, in main
    images = pipeline(
  File "/home/mrt/Projects/LTX-Video/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/mrt/Projects/LTX-Video/ltx_video/pipelines/pipeline_ltx_video.py", line 1039, in __call__
    noise_pred = self.transformer(
  File "/home/mrt/Projects/LTX-Video/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mrt/Projects/LTX-Video/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mrt/Projects/LTX-Video/ltx_video/models/transformers/transformer3d.py", line 419, in forward
    encoder_hidden_states = self.caption_projection(encoder_hidden_states)
  File "/home/mrt/Projects/LTX-Video/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mrt/Projects/LTX-Video/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mrt/Projects/LTX-Video/venv/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 1607, in forward
    hidden_states = self.linear_1(caption)
  File "/home/mrt/Projects/LTX-Video/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/mrt/Projects/LTX-Video/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mrt/Projects/LTX-Video/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

x4080 commented 2 days ago

@MarcosRodrigoT Do you use the new test file from @KT313 ? Or the previous one ? @KT313 is your new test code for multiple GPUs ?

Edit : I tried the test file and it works more frames then previous, but see the same error and retry it and somehow it works, what really is going on - why restarting the command works

Edit2: @KT313 maybe this line is making CUDA and cpu inconsistencies ? (in inference.py)

    if torch.cuda.is_available() and args.disable_load_needed_only:
        pipeline = pipeline.to("cuda")

Edit 4 : I think it works better if above replaced with just

pipeline = pipeline.to("cuda")

to prevent

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

KT313 commented 2 days ago

@x4080 i changed the code on the test branch to

    if torch.cuda.is_available():
        pipeline = pipeline.to("cuda")

as you suggested. you might be able to get away with less than 16GiB if you don't load the whole pipeline to cuda in the beginning and first load only the text encoder, then unload it and then load the unet, but that would require more trying around so if your suggestion works it's the easiest for now.

I tried it on single-gpu only (4090). not sure about multi-gpu, but the original code also doesn't have anything that specifically hints towards multi-gpu, at least not in the parts that i modified.

x4080 commented 1 day ago

@KT313 thanks

KT313 commented 1 day ago

btw just for future readers, you might be able to get away with something as low as 8 or 6 GB if the text embedding gets done on cpu or separately somehow. the generation model itself should only need about 4-5GiB if loaded in bfloat16 (2 bytes per parameter) + some extra for the latent video tensor. Most of the vram currently gets clogged up by the text_embedding model which is comparatively huge. If the text gets embedded to tensors on cpu it might be pretty slow though.

anujsinha72094 commented 17 hours ago

@KT313 I tried with width:1280, height:704, num_frames:201, fps = 16 The video is fine till 160 frames but after 41 frames it's not good, having noise in frames why??

KT313 commented 17 hours ago

@anujsinha72094 pretty unlikely to be related to the changes i made lol

Lightricks / LTX-Video

VRAM info #4