TencentARC / InstantMesh

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Apache License 2.0
3.26k stars 347 forks source link

CUDA Out of Memory Error #23

Open nattybones opened 6 months ago

nattybones commented 6 months ago

I am getting a CUDA out-of-memory error when running this on Ubuntu 22.04.04 with two RXT 3090 GPUs (2x24GB VRAM).

The error:

/home/user/InstantMesh/app.py:135: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.) show_image = torch.from_numpy(show_image) # (960, 640, 3) /tmp/tmpywepamk4.obj 0%| | 0/6 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/user/miniconda3/envs/instantmesh/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "/home/user/miniconda3/envs/instantmesh/lib/python3.10/site-packages/gradio/blocks.py", line 1431, in process_api result = await self.call_function( File "/home/user/miniconda3/envs/instantmesh/lib/python3.10/site-packages/gradio/blocks.py", line 1103, in call_function prediction = await anyio.to_thread.run_sync( File "/home/user/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/user/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/user/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, args) File "/home/user/miniconda3/envs/instantmesh/lib/python3.10/site-packages/gradio/utils.py", line 707, in wrapper response = f(args, kwargs) File "/home/user/InstantMesh/app.py", line 199, in make3d frame = model.forward_geometry( File "/home/user/InstantMesh/src/models/lrm_mesh.py", line 280, in forward_geometry mesh_v, mesh_f, sdf, deformation, v_deformed, sdf_reg_loss = self.get_geometry_prediction(planes) File "/home/user/InstantMesh/src/models/lrm_mesh.py", line 165, in get_geometry_prediction sdf, deformation, sdf_reg_loss, weight = self.get_sdf_deformation_prediction(planes) File "/home/user/InstantMesh/src/models/lrm_mesh.py", line 110, in get_sdf_deformation_prediction sdf, deformation, weight = torch.utils.checkpoint.checkpoint( File "/home/user/miniconda3/envs/instantmesh/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(*args, *kwargs) File "/home/user/miniconda3/envs/instantmesh/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn return fn(args, kwargs) File "/home/user/miniconda3/envs/instantmesh/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner return fn(*args, *kwargs) File "/home/user/miniconda3/envs/instantmesh/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 458, in checkpoint ret = function(args, **kwargs) File "/home/user/InstantMesh/src/models/renderer/synthesizer_mesh.py", line 132, in get_geometry_prediction sdf, deformation, weight = self.decoder.get_geometry_prediction(sampled_features, flexicubes_indices) File "/home/user/InstantMesh/src/models/renderer/synthesizer_mesh.py", line 76, in get_geometry_prediction grid_features = torch.index_select(input=sampled_features, index=flexicubes_indices.reshape(-1), dim=1) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 15.00 GiB. GPU 0 has a total capacty of 23.66 GiB of which 14.47 GiB is free. Process 1759 has 254.00 MiB memory in use. Including non-PyTorch memory, this process has 8.32 GiB memory in use. Of the allocated memory 7.23 GiB is allocated by PyTorch, and 288.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

jloveric commented 6 months ago

I've run into this issue on a 4090. Solution here was to run without rmb on the command line. I downloaded the masked image from the gradio app and then ran the command line tool without rmb and it generated the object fine (still waiting on the textures!) Hopefully there is a more elegant solution.

JustinPack commented 6 months ago

Hey, @nattybones quick update. The app as it is right now is not configured to utilize multiple GPUs. You would need to adjust the contents outlined in app.py for initializing the cuda device to match your particular GPU configuration. You can see that at the end of your error message it says you only have the 23.66GB of memory available instead of the dual GPU amount.

srikar242 commented 6 months ago

@JustinPack I am getting the same error. I have 2 gpu's of 16GB each and code is using only 1gpu. I am not even using app. I running it using run.py python file from command line. I am also not using rmb in the command line. Could you please help on how I can configure the code to use for multiple gpus?

nattybones commented 6 months ago

Hey, @nattybones quick update. The app as it is right now is not configured to utilize multiple GPUs. You would need to adjust the contents outlined in app.py for initializing the cuda device to match your particular GPU configuration. You can see that at the end of your error message it says you only have the 23.66GB of memory available instead of the dual GPU amount.

@JustinPack Yep, I'm working on a dual-GPU implementation right now splits the module functions across the GPUs. The problem I was hoping could be solved is the cuda out-of-memory error on "only" a single 24GB GPU. As jloveric points out, the issue can be solved by running without rmb from the command line, but the gradio app kicks out the error.

srikar242 commented 6 months ago

Hello @nattybones . Could you please help in solving this issue and on how to split the model to use 2 GPU's? I have 2 GPU's of 16GB each and the pipeline is not using the second GPU. I am getting out of memory error same like you. I am using command line to run the model. Command used: python run.py configs/instant-mesh-large.yaml examples/hatsune_miku.png --save_video

nattybones commented 6 months ago

Hey @srikar242, I created a fork that splits functionality across two gpus. It works with 3090s, so your mileage may vary. Claude3 wrote the script, so you'll have to ask him for any help ;) https://github.com/nattybones/InstantMesh2gpu

TzyTman commented 6 months ago

嘿@srikar242,我创建了一个将功能拆分到两个 GPU 上的 fork。它适用于 3090,因此您的里程可能会有所不同。 Claude3 编写了该脚本,因此您必须向他寻求帮助;) https://github.com/nattybones/InstantMesh2gpu

Use your project to run up and report errors!

DoubleCake commented 5 months ago

I only have a 4090. I modified the order in which the models were loaded。 Cuda memory was freed after each phase of computation.

ukarthik27 commented 5 months ago

@DoubleCake Can you share your code / link to an article on how you did it, here pls ?

DoubleCake commented 5 months ago

app_low_varm.zip I hope this helps you. Extract the files to the same directory as app.py and you will be able to use them.

SergioKingOne commented 4 months ago

I've run into this issue on a 4090. Solution here was to run without rmb on the command line. I downloaded the masked image from the gradio app and then ran the command line tool without rmb and it generated the object fine (still waiting on the textures!) Hopefully there is a more elegant solution.

I'm running into this issue as well on a 4090. Forgive my ignorance but could you tell me what you mean by "rmb" please?

favoyang commented 4 months ago

what you mean by "rmb" please?

remove background (--no_rembg)

nagexiaochengzi commented 3 months ago

我只有一架 4090。我修改了模型的加载顺序。在每个计算阶段后,都会释放 Cuda 内存。 Where is app_low_varm.zip?