TencentARC / InstantMesh

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Apache License 2.0
3.37k stars 364 forks source link

Requires 800 gigabytes of video memory #147

Open Mrguanglei opened 2 months ago

Mrguanglei commented 2 months ago

I finished rendering and when I was ready to train nerf, I only used 20 data sets and found out that I needed quite a lot of memory. What happened? I need your help。

(instantmesh1) mrguanglei@guanglei:~/3D/InstantMesh$ python train.py --base configs/instant-nerf-large-train.yaml --gpus 0 --num_nodes 1 /home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( Seed set to 42 Running on GPUs 0 /home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( /home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( Some weights of ViTModel were not initialized from the model checkpoint at facebook/dino-vitb16 and are newly initialized: ['encoder.layer.10.adaLN_modulation.1.weig ht', 'encoder.layer.9.adaLN_modulation.1.bias', 'encoder.layer.5.adaLN_modulation.1.weight', 'encoder.layer.2.adaLN_modulation.1.weight', 'encoder.layer.3.adaLN_modu lation.1.bias', 'encoder.layer.10.adaLN_modulation.1.bias', 'encoder.layer.2.adaLN_modulation.1.bias', 'encoder.layer.11.adaLN_modulation.1.weight', 'encoder.layer.0 .adaLN_modulation.1.weight', 'encoder.layer.11.adaLN_modulation.1.bias', 'encoder.layer.6.adaLN_modulation.1.weight', 'encoder.layer.7.adaLN_modulation.1.bias', 'enc oder.layer.5.adaLN_modulation.1.bias', 'encoder.layer.7.adaLN_modulation.1.weight', 'encoder.layer.6.adaLN_modulation.1.bias', 'encoder.layer.0.adaLN_modulation.1.bi as', 'encoder.layer.1.adaLN_modulation.1.bias', 'encoder.layer.3.adaLN_modulation.1.weight', 'encoder.layer.9.adaLN_modulation.1.weight', 'encoder.layer.8.adaLN_modu lation.1.bias', 'encoder.layer.8.adaLN_modulation.1.weight', 'encoder.layer.4.adaLN_modulation.1.weight', 'encoder.layer.1.adaLN_modulation.1.weight', 'encoder.layer.4.adaLN_modulation.1.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1. You can also use weights=VGG16_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs ============= length of dataset 12 ============= ============= length of dataset 11 ============= accumulate_grad_batches = 1 ++++ NOT USING LR SCALING ++++ Setting learning rate to 4.00e-04 [rank: 0] Seed set to 42 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1

distributed_backend=nccl All distributed processes registered. Starting with 1 processes

You are using a CUDA device ('NVIDIA GeForce RTX 4060 Ti') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('mediu m' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision ============= length of dataset 12 ============= ============= length of dataset 11 ============= LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Project config model: base_learning_rate: 0.0004 target: src.model.MVRecon params: input_size: 320 render_size: 192 lrm_generator_config: target: src.models.lrm.InstantNeRF params: encoder_feat_dim: 768 encoder_freeze: false encoder_model_name: facebook/dino-vitb16 transformer_dim: 512 transformer_layers: 8 transformer_heads: 8 triplane_low_res: 32 triplane_high_res: 64 triplane_dim: 80 rendering_samples_per_ray: 128 data: target: src.data.objaverse.DataModuleFromConfig params: batch_size: 1 num_workers: 4 train: target: src.data.objaverse.ObjaverseData params: root_dir: /home/mrguanglei/3D/InstantMesh/data meta_fname: valid_paths.json input_image_dir: rendering_random_32views target_image_dir: rendering_random_32views input_view_num: 6 target_view_num: 4 total_view_n: 32 fov: 50 camera_rotation: true validation: false validation: target: src.data.objaverse.ValidationData params: root_dir: /home/mrguanglei/3D/InstantMesh/data/vaild input_view_num: 6 input_image_size: 320 fov: 30 lightning: modelcheckpoint: params: every_n_train_steps: 1000 save_top_k: -1 save_last: true callbacks: {} trainer: benchmark: true max_epochs: -1 gradient_clip_val: 1.0 val_check_interval: 1000 num_sanity_val_steps: 0 accumulate_grad_batches: 1 check_val_every_n_epoch: null accelerator: gpu devices: 1

| Name | Type | Params

0 | lrm_generator | InstantNeRF | 152 M 1 | lpips | LearnedPerceptualImagePatchSimilarity | 14.7 M

152 M Trainable params 14.7 M Non-trainable params 166 M Total params 667.701 Total estimated model params size (MB) Epoch 0: | | 0/? 00:00<?, ?it/s: Traceback (most recent call last): rank0: File "/home/mrguanglei/3D/InstantMesh/train.py", line 284, in rank0: trainer.fit(model, data) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit

rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt rank0: return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, *kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 102, in launch rank0: return function(args, **kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl rank0: self._run(model, ckpt_path=ckpt_path) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run rank0: results = self._run_stage() rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_stage

rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run

rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 359, in advance

rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 136, in run

rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 240, in advance
rank0: batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 187, in run
rank0: self._optimizer_step(batch_idx, closure) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 265, in _optimizer_step

rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 157, in _call_lightning_module_hook rank0: output = fn(*args, **kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1282, in optimizer_step

rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 151, in step rank0: step_output = self._strategy.optimizer_step(self._optimizer, closure, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 264, in optimizer_step rank0: optimizer_output = super().optimizer_step(optimizer, closure, model, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 230, in optimizer_step
rank0: return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision.py", line 117, in optimizer_step rank0: return optimizer.step(closure=closure, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 75, in wrapper rank0: return wrapped(*args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/optimizer.py", line 391, in wrapper rank0: out = func(*args, *kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/optimizer.py", line 76, in _use_grad rank0: ret = func(self, args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/optim/adamw.py", line 165, in step rank0: loss = closure() rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision.py", line 104, in _wrap_closure rank0: closure_result = closure() rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 140, in call rank0: self._result = self.closure(*args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context rank0: return func(*args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 126, in closure
rank0: step_output = self._step_fn() rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 315, in _training_step rank0: training_step_output = call._call_strategy_hook(trainer, "training_step", kwargs.values()) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
rank0: output = fn(
args,
kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 381, in training_step
rank0: return self._forward_redirection(self.model, self.lightning_module, "training_step", *args,
kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 633, in call rank0: wrapper_output = wrapper_module(*args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1593, in forward rank0: else self._run_ddp_forward(*inputs, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1411, in _run_ddp_forward rank0: return self.module(*inputs, *kwargs) # type: ignoreindex: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(args, kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(*args, *kwargs) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 626, in wrapped_forward
rank0: out = method(
_args, **_kwargs) rank0: File "/home/mrguanglei/3D/InstantMesh/src/model.py", line 196, in training_step rank0: lrm_generator_input, render_gt = self.prepare_batch_data(batch) rank0: File "/home/mrguanglei/3D/InstantMesh/src/model.py", line 84, in prepare_batch_data rank0: target_depths = v2.functional.resize( rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/transforms/v2/functional/_geometry.py", line 189, in resize
rank0: return kernel(inpt, size=size, interpolation=interpolation, max_size=max_size, antialias=antialias) rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torchvision/transforms/v2/functional/_geometry.py", line 254, in resize_image rank0: image = interpolate( rank0: File "/home/mrguanglei/anaconda3/envs/instantmesh1/lib/python3.10/site-packages/torch/nn/functional.py", line 4028, in interpolate rank0: return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) rank0: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 844.10 GiB. GPU

Mrguanglei commented 2 months ago

@zawa999 How to use this, looking forward to your reply

Biggaoga commented 2 months ago

same question