[OOM issue] OOM issue during preprocessing

minsu1206 commented 4 months ago

Hi

I am using A5000 24GB for preprocessing but I met OOM issue at the first batch (indeed, it's a single image) during preprocessing. I don't think CLIP and DINO used at here are too large model to meet this issue.

How much do I need to do preprocessing ? Or , How could I fix this error ?

Thanks in advance.

Shuaizhang7 commented 4 months ago

I encountered the same problem when preprocessing the room and kitchen datasets in mipnerf360. I use RTX 3090 24G.

Chuan-10 commented 4 months ago

Hi, could you share the OOM (Out of Memory) error logs? That way, I can help figure out what went wrong.

Shuaizhang7 commented 4 months ago

My error logs: ./data/mipnerf360/kitchen/images 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 278/278 [00:29<00:00, 9.50it/s] Scales: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [3:06:22<00:00, 1597.57s/it] [DinoExtractor] Loading failed. Extracting ...
Using cache found in /home/zhangshuai/.cache/torch/hub/facebookresearch_dino_main 1%|▉ | 1/139 [00:01<02:52, 1.25s/it] Traceback (most recent call last): File "quantize_features.py", line 153, in trainer.train() File "quantize_features.py", line 63, in train data_loader = DataLoader(self.select_dataset(), batch_size=self.args.batch_size, shuffle=self.args.shuffle) File "quantize_features.py", line 60, in select_dataset return dataset_cls(self.args.image_dir) File "/data5/zhangshuai/LEGaussians/preprocess/semantic_feature_dataloader.py", line 109, in init self._concat_features() File "/data5/zhangshuai/LEGaussians/preprocess/semantic_feature_dataloader.py", line 144, in _concat_features dinos = get_dinos(self.path, self.dino_params, half=True).permute(0, 3, 1, 2).to("cuda") File "/data5/zhangshuai/LEGaussians/preprocess/dino/dino_dataloader.py", line 62, in get_dinos dinos.append(extract_feature(extractor, image_paths[:i], args).to('cpu')) File "/data5/zhangshuai/LEGaussians/preprocess/dino/dino_dataloader.py", line 43, in extract_feature descriptors.append(extract_feature_single(extractor, image_path, args)) File "/data5/zhangshuai/LEGaussians/preprocess/dino/dino_dataloader.py", line 36, in extract_feature_single descriptors = extractor.extract_descriptors(image_batch.to(args.device), args.layer, args.facet, args.bin) File "/data5/zhangshuai/LEGaussians/preprocess/dino/extractor.py", line 308, in extract_descriptors self._extract_features(batch, [layer], facet) File "/data5/zhangshuai/LEGaussians/preprocess/dino/extractor.py", line 242, in _extractfeatures = self.model(batch) File "/data5/zhangshuai/anaconda3/envs/legaussians/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/home/zhangshuai/.cache/torch/hub/facebookresearch_dino_main/vision_transformer.py", line 212, in forward x = blk(x) File "/data5/zhangshuai/anaconda3/envs/legaussians/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/zhangshuai/.cache/torch/hub/facebookresearch_dino_main/vision_transformer.py", line 108, in forward y, attn = self.attn(self.norm1(x)) File "/data5/zhangshuai/anaconda3/envs/legaussians/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/zhangshuai/.cache/torch/hub/facebookresearch_dino_main/vision_transformer.py", line 85, in forward attn = (q @ k.transpose(-2, -1)) * self.scale RuntimeError: CUDA out of memory. Tried to allocate 7.14 GiB (GPU 0; 23.69 GiB total capacity; 8.13 GiB already allocated; 6.67 GiB free; 15.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Chuan-10 commented 4 months ago

Hi, I think you can try a bigger stride (like 4) below: https://github.com/buaavrcg/LEGaussians/blob/e2e5e49d7845e34811d0d805aaf637d02b43f6a7/preprocess/dino/dino_dataloader.py#L15 This way, you may get a lower reslution dino feature map, like [N, 384, 55, 83].

Shuaizhang7 commented 4 months ago

Thanks for your quick reply, my problem is solved.

minsu1206 commented 4 months ago

Samer for me. I'll close the issue. Thank you !

DapengFeng commented 4 months ago

Excellent work. I have a few inquiries. Is the preprocessing stage significantly impactful on the ultimate outcomes? How can one replicate the findings as presented in the paper? Could you provide a preprocessed version for release? @Chuan-10

Chuan-10 commented 4 months ago

Yes, the preprocessing stage is crucial. The preprocessing instructions can be found in the readme, which is quite clear. To reproduce the paper's results, follow the readme for training and evaluating the model. Although the visualization code for the relevance map is not included, generating the heatmap from the relevancy map in the evaluation output is straightforward. We intend to release a preprocessed version later, but currently, our team is occupied with other tasks. We apologize for the inconvenience. If you have any specific problems, feel free to open an issue to discuss it. We are willing to help you.

buaavrcg / LEGaussians

[OOM issue] OOM issue during preprocessing #6