Closed jingyu198 closed 4 months ago
Hello, I have same problem as you, could you share me how you fix this? I will be very thanks!!!
Hello, I have same problem as you, could you share me how you fix this? I will be very thanks!!!
The problem here is due to the dataset format. I noticed that the dimension of depth (with only 1 channel) didn't match with the depth I prepared (with 3 dimensions), thus leading to the later collapse in memory use.
Hello, I have same problem as you, could you share me how you fix this? I will be very thanks!!!
besides, I use a batch size of 1 and then can train smoothly
Thanks
---- Replied Message ---- | From | @.> | | Date | 07/13/2024 21:48 | | To | TencentARC/InstantMesh @.> | | Cc | ustbzgn @.>, Comment @.> | | Subject | Re: [TencentARC/InstantMesh] Out of Memory During Training (Issue #117) |
Hello, I have same problem as you, could you share me how you fix this? I will be very thanks!!!
besides, I use a batch size of 1 and then can train smoothly
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Hi, I'm also working on training the InstantMesh, may I ask you about the performance of training the InstantMesh using your own data?
Hi, I'm also working on training the InstantMesh, may I ask you about the performance of training the InstantMesh using your own data?
Sure, everything seems good now. Many bugs are actually caused by the custom data format, e.g., camera parameters, axis direction of normals....
Hi, very wonderful work and I am trying to finetune the model using custom dataset on 8 * A100, each has 80G.
What I have done is modifying the config and data path. However, here is a OOM bug as follows:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 500.00 GiB. GPU 3 has a total capacty of 79.14 GiB of which 74.21 GiB is free. Including non-PyTorch memory, this process has 4.92 GiB memory in use. Of the allocated memory 3.36 GiB is allocated by PyTorch, and 155.45 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I tried to solve it by setting batch size to 1 but failed. could you please help with that?