I finished rendering and when I was ready to train nerf, I only used 20 data sets and found out that I needed quite a lot of memory. What happened? I need your help。
(instantmesh1) mrguanglei@guanglei:~/3D/InstantMesh$ python train.py --base configs/instant-nerf-large-train.yaml --gpus 0 --num_nodes 1
/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
Seed set to 42
Running on GPUs 0
/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
Some weights of ViTModel were not initialized from the model checkpoint at facebook/dino-vitb16 and are newly initialized: ['encoder.layer.10.adaLN_modulation.1.weig
ht', 'encoder.layer.9.adaLN_modulation.1.bias', 'encoder.layer.5.adaLN_modulation.1.weight', 'encoder.layer.2.adaLN_modulation.1.weight', 'encoder.layer.3.adaLN_modu
lation.1.bias', 'encoder.layer.10.adaLN_modulation.1.bias', 'encoder.layer.2.adaLN_modulation.1.bias', 'encoder.layer.11.adaLN_modulation.1.weight', 'encoder.layer.0
.adaLN_modulation.1.weight', 'encoder.layer.11.adaLN_modulation.1.bias', 'encoder.layer.6.adaLN_modulation.1.weight', 'encoder.layer.7.adaLN_modulation.1.bias', 'enc
oder.layer.5.adaLN_modulation.1.bias', 'encoder.layer.7.adaLN_modulation.1.weight', 'encoder.layer.6.adaLN_modulation.1.bias', 'encoder.layer.0.adaLN_modulation.1.bi
as', 'encoder.layer.1.adaLN_modulation.1.bias', 'encoder.layer.3.adaLN_modulation.1.weight', 'encoder.layer.9.adaLN_modulation.1.weight', 'encoder.layer.8.adaLN_modu
lation.1.bias', 'encoder.layer.8.adaLN_modulation.1.weight', 'encoder.layer.4.adaLN_modulation.1.weight', 'encoder.layer.1.adaLN_modulation.1.weight', 'encoder.layer.4.adaLN_modulation.1.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None
for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1. You can also use weights=VGG16_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
============= length of dataset 12 =============
============= length of dataset 11 =============
accumulate_grad_batches = 1
++++ NOT USING LR SCALING ++++
Setting learning rate to 4.00e-04
[rank: 0] Seed set to 42
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
You are using a CUDA device ('NVIDIA GeForce RTX 4060 Ti') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('mediu m' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
============= length of dataset 12 =============
============= length of dataset 11 =============
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Project config
model:
base_learning_rate: 0.0004
target: src.model.MVRecon
params:
input_size: 320
render_size: 192
lrm_generator_config:
target: src.models.lrm.InstantNeRF
params:
encoder_feat_dim: 768
encoder_freeze: false
encoder_model_name: facebook/dino-vitb16
transformer_dim: 512
transformer_layers: 8
transformer_heads: 8
triplane_low_res: 32
triplane_high_res: 64
triplane_dim: 80
rendering_samples_per_ray: 128
data:
target: src.data.objaverse.DataModuleFromConfig
params:
batch_size: 1
num_workers: 4
train:
target: src.data.objaverse.ObjaverseData
params:
root_dir: /home/mrguanglei/3D/InstantMesh/data
meta_fname: valid_paths.json
input_image_dir: rendering_random_32views
target_image_dir: rendering_random_32views
input_view_num: 6
target_view_num: 4
total_view_n: 32
fov: 50
camera_rotation: true
validation: false
validation:
target: src.data.objaverse.ValidationData
params:
root_dir: /home/mrguanglei/3D/InstantMesh/data/vaild
input_view_num: 6
input_image_size: 320
fov: 30
lightning:
modelcheckpoint:
params:
every_n_train_steps: 1000
save_top_k: -1
save_last: true
callbacks: {}
trainer:
benchmark: true
max_epochs: -1
gradient_clip_val: 1.0
val_check_interval: 1000
num_sanity_val_steps: 0
accumulate_grad_batches: 1
check_val_every_n_epoch: null
accelerator: gpu
devices: 1
| Name | Type | Params
0 | lrm_generator | InstantNeRF | 152 M
1 | lpips | LearnedPerceptualImagePatchSimilarity | 14.7 M
152 M Trainable params
14.7 M Non-trainable params
166 M Total params
667.701 Total estimated model params size (MB)
Epoch 0: | | 0/? 00:00<?, ?it/s: Traceback (most recent call last):
rank0: File "/home/mrguanglei/3D/InstantMesh/train.py", line 284, in rank0: trainer.fit(model, data)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
rank0: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, *kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch
rank0: return function(args, **kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
rank0: self._run(model, ckpt_path=ckpt_path)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
rank0: results = self._run_stage()
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_stage
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 359, in advance
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 136, in run
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 240, in advance rank0: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 187, in run rank0: self._optimizer_step(batch_idx, closure)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 265, in _optimizer_step
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 157, in _call_lightning_module_hook
rank0: output = fn(*args, **kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1282, in optimizer_step
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 151, in step
rank0: step_output = self._strategy.optimizer_step(self._optimizer, closure, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 264, in optimizer_step
rank0: optimizer_output = super().optimizer_step(optimizer, closure, model, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 230, in optimizer_step rank0: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision.py", line 117, in optimizer_step
rank0: return optimizer.step(closure=closure, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 75, in wrapper
rank0: return wrapped(*args, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/optimizer.py", line 391, in wrapper
rank0: out = func(*args, *kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
rank0: ret = func(self, args, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/adamw.py", line 165, in step
rank0: loss = closure()
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision.py", line 104, in _wrap_closure
rank0: closure_result = closure()
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 140, in callrank0: self._result = self.closure(*args, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
rank0: return func(*args, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 126, in closure rank0: step_output = self._step_fn()
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 315, in _training_step
rank0: training_step_output = call._call_strategy_hook(trainer, "training_step", kwargs.values())
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook rank0: output = fn(args, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 381, in training_step rank0: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 633, in callrank0: wrapper_output = wrapper_module(*args, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
rank0: return self._call_impl(*args, *kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
rank0: return forward_call(args, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1593, in forward
rank0: else self._run_ddp_forward(*inputs, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1411, in _run_ddp_forward
rank0: return self.module(*inputs, *kwargs) # type: ignoreindex: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
rank0: return self._call_impl(args, kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
rank0: return forward_call(*args, *kwargs)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 626, in wrapped_forward rank0: out = method(_args, **_kwargs)
rank0: File "/home/mrguanglei/3D/InstantMesh/src/model.py", line 196, in training_step
rank0: lrm_generator_input, render_gt = self.prepare_batch_data(batch)
rank0: File "/home/mrguanglei/3D/InstantMesh/src/model.py", line 84, in prepare_batch_data
rank0: target_depths = v2.functional.resize(
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/transforms/v2/functional/_geometry.py", line 189, in resize rank0: return kernel(inpt, size=size, interpolation=interpolation, max_size=max_size, antialias=antialias)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/transforms/v2/functional/_geometry.py", line 254, in resize_image
rank0: image = interpolate(
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/functional.py", line 4028, in interpolate
rank0: return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
rank0: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 844.10 GiB. GPU
I finished rendering and when I was ready to train nerf, I only used 20 data sets and found out that I needed quite a lot of memory. What happened? I need your help。
(instantmesh1) mrguanglei@guanglei:~/3D/InstantMesh$ python train.py --base configs/instant-nerf-large-train.yaml --gpus 0 --num_nodes 1 /home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? warn( Seed set to 42 Running on GPUs 0 /home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( /home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( Some weights of ViTModel were not initialized from the model checkpoint at facebook/dino-vitb16 and are newly initialized: ['encoder.layer.10.adaLN_modulation.1.weig ht', 'encoder.layer.9.adaLN_modulation.1.bias', 'encoder.layer.5.adaLN_modulation.1.weight', 'encoder.layer.2.adaLN_modulation.1.weight', 'encoder.layer.3.adaLN_modu lation.1.bias', 'encoder.layer.10.adaLN_modulation.1.bias', 'encoder.layer.2.adaLN_modulation.1.bias', 'encoder.layer.11.adaLN_modulation.1.weight', 'encoder.layer.0 .adaLN_modulation.1.weight', 'encoder.layer.11.adaLN_modulation.1.bias', 'encoder.layer.6.adaLN_modulation.1.weight', 'encoder.layer.7.adaLN_modulation.1.bias', 'enc oder.layer.5.adaLN_modulation.1.bias', 'encoder.layer.7.adaLN_modulation.1.weight', 'encoder.layer.6.adaLN_modulation.1.bias', 'encoder.layer.0.adaLN_modulation.1.bi as', 'encoder.layer.1.adaLN_modulation.1.bias', 'encoder.layer.3.adaLN_modulation.1.weight', 'encoder.layer.9.adaLN_modulation.1.weight', 'encoder.layer.8.adaLN_modu lation.1.bias', 'encoder.layer.8.adaLN_modulation.1.weight', 'encoder.layer.4.adaLN_modulation.1.weight', 'encoder.layer.1.adaLN_modulation.1.weight', 'encoder.layer.4.adaLN_modulation.1.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum orNone
for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=VGG16_Weights.IMAGENET1K_V1
. You can also useweights=VGG16_Weights.DEFAULT
to get the most up-to-date weights. warnings.warn(msg) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs ============= length of dataset 12 ============= ============= length of dataset 11 ============= accumulate_grad_batches = 1 ++++ NOT USING LR SCALING ++++ Setting learning rate to 4.00e-04 [rank: 0] Seed set to 42 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1distributed_backend=nccl All distributed processes registered. Starting with 1 processes
You are using a CUDA device ('NVIDIA GeForce RTX 4060 Ti') that has Tensor Cores. To properly utilize them, you should set
torch.set_float32_matmul_precision('mediu m' | 'high')
which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision ============= length of dataset 12 ============= ============= length of dataset 11 ============= LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Project config model: base_learning_rate: 0.0004 target: src.model.MVRecon params: input_size: 320 render_size: 192 lrm_generator_config: target: src.models.lrm.InstantNeRF params: encoder_feat_dim: 768 encoder_freeze: false encoder_model_name: facebook/dino-vitb16 transformer_dim: 512 transformer_layers: 8 transformer_heads: 8 triplane_low_res: 32 triplane_high_res: 64 triplane_dim: 80 rendering_samples_per_ray: 128 data: target: src.data.objaverse.DataModuleFromConfig params: batch_size: 1 num_workers: 4 train: target: src.data.objaverse.ObjaverseData params: root_dir: /home/mrguanglei/3D/InstantMesh/data meta_fname: valid_paths.json input_image_dir: rendering_random_32views target_image_dir: rendering_random_32views input_view_num: 6 target_view_num: 4 total_view_n: 32 fov: 50 camera_rotation: true validation: false validation: target: src.data.objaverse.ValidationData params: root_dir: /home/mrguanglei/3D/InstantMesh/data/vaild input_view_num: 6 input_image_size: 320 fov: 30 lightning: modelcheckpoint: params: every_n_train_steps: 1000 save_top_k: -1 save_last: true callbacks: {} trainer: benchmark: true max_epochs: -1 gradient_clip_val: 1.0 val_check_interval: 1000 num_sanity_val_steps: 0 accumulate_grad_batches: 1 check_val_every_n_epoch: null accelerator: gpu devices: 1| Name | Type | Params
0 | lrm_generator | InstantNeRF | 152 M 1 | lpips | LearnedPerceptualImagePatchSimilarity | 14.7 M
152 M Trainable params 14.7 M Non-trainable params 166 M Total params 667.701 Total estimated model params size (MB) Epoch 0: | | 0/? 00:00<?, ?it/s: Traceback (most recent call last): rank0: File "/home/mrguanglei/3D/InstantMesh/train.py", line 284, in
rank0: trainer.fit(model, data)
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt rank0: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, *kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch rank0: return function(args, **kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl rank0: self._run(model, ckpt_path=ckpt_path) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run rank0: results = self._run_stage() rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_stage
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 359, in advance
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 136, in run
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 240, in advance
rank0: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 187, in run
rank0: self._optimizer_step(batch_idx, closure) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 265, in _optimizer_step
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 157, in _call_lightning_module_hook rank0: output = fn(*args, **kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1282, in optimizer_step
rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 151, in step rank0: step_output = self._strategy.optimizer_step(self._optimizer, closure, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 264, in optimizer_step rank0: optimizer_output = super().optimizer_step(optimizer, closure, model, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 230, in optimizer_step
rank0: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision.py", line 117, in optimizer_step rank0: return optimizer.step(closure=closure, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 75, in wrapper rank0: return wrapped(*args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/optimizer.py", line 391, in wrapper rank0: out = func(*args, *kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/optimizer.py", line 76, in _use_grad rank0: ret = func(self, args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/adamw.py", line 165, in step rank0: loss = closure() rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision.py", line 104, in _wrap_closure rank0: closure_result = closure() rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 140, in call rank0: self._result = self.closure(*args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context rank0: return func(*args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 126, in closure
rank0: step_output = self._step_fn() rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 315, in _training_step rank0: training_step_output = call._call_strategy_hook(trainer, "training_step", kwargs.values()) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
rank0: output = fn(args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 381, in training_step
rank0: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 633, in call rank0: wrapper_output = wrapper_module(*args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1593, in forward rank0: else self._run_ddp_forward(*inputs, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1411, in _run_ddp_forward rank0: return self.module(*inputs, *kwargs) # type: ignoreindex: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(*args, *kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 626, in wrapped_forward
rank0: out = method(_args, **_kwargs) rank0: File "/home/mrguanglei/3D/InstantMesh/src/model.py", line 196, in training_step rank0: lrm_generator_input, render_gt = self.prepare_batch_data(batch) rank0: File "/home/mrguanglei/3D/InstantMesh/src/model.py", line 84, in prepare_batch_data rank0: target_depths = v2.functional.resize( rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/transforms/v2/functional/_geometry.py", line 189, in resize
rank0: return kernel(inpt, size=size, interpolation=interpolation, max_size=max_size, antialias=antialias) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/transforms/v2/functional/_geometry.py", line 254, in resize_image rank0: image = interpolate( rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/functional.py", line 4028, in interpolate rank0: return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) rank0: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 844.10 GiB. GPU